You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Vineet Mishra <cl...@gmail.com> on 2015/06/11 10:28:02 UTC

Kylin Query Latency and Number of Parallel Queries

Hi,

I was trying Kylin for some of my usecase, where the data cube size is
110Mb with 5 Million Records, the query for full data takes around a minute
or so which seems to be taking hell lot of time, even apart from this I was
wondering as what is the query threshold that Kylin can handle in parallel.

For instance, how many queries can be fired in parallel to our aggregated
data cubes and is there some practice which can gain the query performance.

Urgent Call!

Thanks!

Re: Kylin Query Latency and Number of Parallel Queries

Posted by Vineet Mishra <cl...@gmail.com>.

Thanks Luke! :)

On Sat, Jun 20, 2015 at 11:44 AM, Luke Han <lu...@gmail.com> wrote:

> Hi Vineet,
>     I got it, please feel free to continue post your question here. We are
> happy to help, but frankly talk, we can't grantee the response time since
> we also have tasks inside. But we will try our best to help everyone to use
> Kylin smoothly.
>     For your case, the concurrency should not be an issue, if you can
> control the queries coming from Tableau, that mean do not allow Tableau
> dashboard/report to pull huge data in one query. For example, please use
> "connect live" not "import data" in Tableau.
>     And, please setup more nodes to serve high concurrency requests,
> Kylin's REST server is stateless which could scale out very well.
>
>     Any issue, please let's know.
>     Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Fri, Jun 19, 2015 at 5:48 PM, Vineet Mishra <cl...@gmail.com>
> wrote:
>
> > Thanks Luke for the prompt response.
> >
> > As the Kylin project being in incubation mode with comparatively little
> > less active mailers and due to the demand of my project which has already
> > crossed the expected deliverable timeline, I have to put it that way! :)
> >
> > Well my use case is to get the aggregated data across various dimensions
> to
> > visualize it on tableau. The visualization will be accessed by 100 of
> users
> > (even more) and the connection will be live, as a result multiple queries
> > are expected.
> >
> > On Fri, Jun 19, 2015 at 11:57 PM, Luke Han <lu...@gmail.com> wrote:
> >
> > > Hi Vineet,
> > >     One query to pull 5 millions data will take a time which is not
> > > recommended way to leverage Kylin.
> > >     In our internal performance testing, Kylin could handle hundreds
> QPS
> > > for small queries on single machine with several tomcat instances,
> please
> > > refer to this slides (P31) for more detail:
> > >
> > >
> >
> http://www.slideshare.net/lukehan/apache-kylin-big-data-technology-conference-2014-beijing-v2
> > >
> > >     Kylin is not a database which can only serve well for certain
> cases,
> > > please evaluate your requirements, case, data, it's appreciated if you
> > > could share more detail about your case, then we could have more clear
> > idea
> > > to help you:)
> > >
> > >     BTW, "Urgent Call!" is your signature or really urgent? I saw it in
> > > every your thread and wondering about it:-)
> > >
> > >     Thank you very much
> > >
> > > Luke
> > >
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Fri, Jun 19, 2015 at 7:51 AM, Adunuthula, Seshu <
> sadunuthula@ebay.com
> > >
> > > wrote:
> > >
> > > > Sizing & Tuning Hbase requires some skills, but there is a lot of
> help
> > > > available on the web. Here are some basic principles to begin with.
> > > >
> > > > 1. Do not colocate Hbase Region Servers and MapReduce on the same
> > nodes.
> > > > Shut down the Node Managers on the nodes running the Region Servers.
> It
> > > > reduces your MR Capacity but makes your Hbase a lot more stable.
> > > > 2. Size your Region Servers correctly. Here is a great blog by Lars
> on
> > > > this subject.
> > > >
> > >
> >
> https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
> > > > bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
> > > >
> > > > Regards
> > > > Seshu Adunuthula
> > > >
> > > >
> > > > On 6/19/15, 3:12 AM, "Li Yang" <li...@apache.org> wrote:
> > > >
> > > > >In the end, HBase is the bottleneck of the number parallel queries.
> > > > >Because
> > > > >every query will translated into one or more HBase scan. Assuming
> not
> > > much
> > > > >online processing is required (data is pre-aggregated right), the
> > HBase
> > > > >scan will be the bottleneck.
> > > > >
> > > > >On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <sh...@ebay.com>
> > > wrote:
> > > > >
> > > > >> Recommend for reading:
> > > > >>
> > > > >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > > > >>
> > > > >>
> > > > >> On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com>
> > wrote:
> > > > >>
> > > > >> >Hi,
> > > > >> >
> > > > >> >I was trying Kylin for some of my usecase, where the data cube
> size
> > > is
> > > > >> >110Mb with 5 Million Records, the query for full data takes
> around
> > a
> > > > >> >minute
> > > > >> >or so which seems to be taking hell lot of time, even apart from
> > > this I
> > > > >> >was
> > > > >> >wondering as what is the query threshold that Kylin can handle in
> > > > >> >parallel.
> > > > >> >
> > > > >> >For instance, how many queries can be fired in parallel to our
> > > > >>aggregated
> > > > >> >data cubes and is there some practice which can gain the query
> > > > >> >performance.
> > > > >> >
> > > > >> >Urgent Call!
> > > > >> >
> > > > >> >Thanks!
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Kylin Query Latency and Number of Parallel Queries

Posted by Luke Han <lu...@gmail.com>.

Hi Vineet,
    I got it, please feel free to continue post your question here. We are
happy to help, but frankly talk, we can't grantee the response time since
we also have tasks inside. But we will try our best to help everyone to use
Kylin smoothly.
    For your case, the concurrency should not be an issue, if you can
control the queries coming from Tableau, that mean do not allow Tableau
dashboard/report to pull huge data in one query. For example, please use
"connect live" not "import data" in Tableau.
    And, please setup more nodes to serve high concurrency requests,
Kylin's REST server is stateless which could scale out very well.

    Any issue, please let's know.
    Thanks.




Best Regards!
---------------------

Luke Han

On Fri, Jun 19, 2015 at 5:48 PM, Vineet Mishra <cl...@gmail.com>
wrote:

> Thanks Luke for the prompt response.
>
> As the Kylin project being in incubation mode with comparatively little
> less active mailers and due to the demand of my project which has already
> crossed the expected deliverable timeline, I have to put it that way! :)
>
> Well my use case is to get the aggregated data across various dimensions to
> visualize it on tableau. The visualization will be accessed by 100 of users
> (even more) and the connection will be live, as a result multiple queries
> are expected.
>
> On Fri, Jun 19, 2015 at 11:57 PM, Luke Han <lu...@gmail.com> wrote:
>
> > Hi Vineet,
> >     One query to pull 5 millions data will take a time which is not
> > recommended way to leverage Kylin.
> >     In our internal performance testing, Kylin could handle hundreds QPS
> > for small queries on single machine with several tomcat instances, please
> > refer to this slides (P31) for more detail:
> >
> >
> http://www.slideshare.net/lukehan/apache-kylin-big-data-technology-conference-2014-beijing-v2
> >
> >     Kylin is not a database which can only serve well for certain cases,
> > please evaluate your requirements, case, data, it's appreciated if you
> > could share more detail about your case, then we could have more clear
> idea
> > to help you:)
> >
> >     BTW, "Urgent Call!" is your signature or really urgent? I saw it in
> > every your thread and wondering about it:-)
> >
> >     Thank you very much
> >
> > Luke
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Fri, Jun 19, 2015 at 7:51 AM, Adunuthula, Seshu <sadunuthula@ebay.com
> >
> > wrote:
> >
> > > Sizing & Tuning Hbase requires some skills, but there is a lot of help
> > > available on the web. Here are some basic principles to begin with.
> > >
> > > 1. Do not colocate Hbase Region Servers and MapReduce on the same
> nodes.
> > > Shut down the Node Managers on the nodes running the Region Servers. It
> > > reduces your MR Capacity but makes your Hbase a lot more stable.
> > > 2. Size your Region Servers correctly. Here is a great blog by Lars on
> > > this subject.
> > >
> >
> https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
> > > bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
> > >
> > > Regards
> > > Seshu Adunuthula
> > >
> > >
> > > On 6/19/15, 3:12 AM, "Li Yang" <li...@apache.org> wrote:
> > >
> > > >In the end, HBase is the bottleneck of the number parallel queries.
> > > >Because
> > > >every query will translated into one or more HBase scan. Assuming not
> > much
> > > >online processing is required (data is pre-aggregated right), the
> HBase
> > > >scan will be the bottleneck.
> > > >
> > > >On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <sh...@ebay.com>
> > wrote:
> > > >
> > > >> Recommend for reading:
> > > >>
> > > >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > > >>
> > > >>
> > > >> On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com>
> wrote:
> > > >>
> > > >> >Hi,
> > > >> >
> > > >> >I was trying Kylin for some of my usecase, where the data cube size
> > is
> > > >> >110Mb with 5 Million Records, the query for full data takes around
> a
> > > >> >minute
> > > >> >or so which seems to be taking hell lot of time, even apart from
> > this I
> > > >> >was
> > > >> >wondering as what is the query threshold that Kylin can handle in
> > > >> >parallel.
> > > >> >
> > > >> >For instance, how many queries can be fired in parallel to our
> > > >>aggregated
> > > >> >data cubes and is there some practice which can gain the query
> > > >> >performance.
> > > >> >
> > > >> >Urgent Call!
> > > >> >
> > > >> >Thanks!
> > > >>
> > > >>
> > >
> > >
> >
>

Re: Kylin Query Latency and Number of Parallel Queries

Posted by Vineet Mishra <cl...@gmail.com>.

Thanks Luke for the prompt response.

As the Kylin project being in incubation mode with comparatively little
less active mailers and due to the demand of my project which has already
crossed the expected deliverable timeline, I have to put it that way! :)

Well my use case is to get the aggregated data across various dimensions to
visualize it on tableau. The visualization will be accessed by 100 of users
(even more) and the connection will be live, as a result multiple queries
are expected.

On Fri, Jun 19, 2015 at 11:57 PM, Luke Han <lu...@gmail.com> wrote:

> Hi Vineet,
>     One query to pull 5 millions data will take a time which is not
> recommended way to leverage Kylin.
>     In our internal performance testing, Kylin could handle hundreds QPS
> for small queries on single machine with several tomcat instances, please
> refer to this slides (P31) for more detail:
>
> http://www.slideshare.net/lukehan/apache-kylin-big-data-technology-conference-2014-beijing-v2
>
>     Kylin is not a database which can only serve well for certain cases,
> please evaluate your requirements, case, data, it's appreciated if you
> could share more detail about your case, then we could have more clear idea
> to help you:)
>
>     BTW, "Urgent Call!" is your signature or really urgent? I saw it in
> every your thread and wondering about it:-)
>
>     Thank you very much
>
> Luke
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Fri, Jun 19, 2015 at 7:51 AM, Adunuthula, Seshu <sa...@ebay.com>
> wrote:
>
> > Sizing & Tuning Hbase requires some skills, but there is a lot of help
> > available on the web. Here are some basic principles to begin with.
> >
> > 1. Do not colocate Hbase Region Servers and MapReduce on the same nodes.
> > Shut down the Node Managers on the nodes running the Region Servers. It
> > reduces your MR Capacity but makes your Hbase a lot more stable.
> > 2. Size your Region Servers correctly. Here is a great blog by Lars on
> > this subject.
> >
> https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
> > bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
> >
> > Regards
> > Seshu Adunuthula
> >
> >
> > On 6/19/15, 3:12 AM, "Li Yang" <li...@apache.org> wrote:
> >
> > >In the end, HBase is the bottleneck of the number parallel queries.
> > >Because
> > >every query will translated into one or more HBase scan. Assuming not
> much
> > >online processing is required (data is pre-aggregated right), the HBase
> > >scan will be the bottleneck.
> > >
> > >On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <sh...@ebay.com>
> wrote:
> > >
> > >> Recommend for reading:
> > >>
> > >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > >>
> > >>
> > >> On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com> wrote:
> > >>
> > >> >Hi,
> > >> >
> > >> >I was trying Kylin for some of my usecase, where the data cube size
> is
> > >> >110Mb with 5 Million Records, the query for full data takes around a
> > >> >minute
> > >> >or so which seems to be taking hell lot of time, even apart from
> this I
> > >> >was
> > >> >wondering as what is the query threshold that Kylin can handle in
> > >> >parallel.
> > >> >
> > >> >For instance, how many queries can be fired in parallel to our
> > >>aggregated
> > >> >data cubes and is there some practice which can gain the query
> > >> >performance.
> > >> >
> > >> >Urgent Call!
> > >> >
> > >> >Thanks!
> > >>
> > >>
> >
> >
>

Re: Kylin Query Latency and Number of Parallel Queries

Posted by Luke Han <lu...@gmail.com>.

Hi Vineet,
    One query to pull 5 millions data will take a time which is not
recommended way to leverage Kylin.
    In our internal performance testing, Kylin could handle hundreds QPS
for small queries on single machine with several tomcat instances, please
refer to this slides (P31) for more detail:
http://www.slideshare.net/lukehan/apache-kylin-big-data-technology-conference-2014-beijing-v2

    Kylin is not a database which can only serve well for certain cases,
please evaluate your requirements, case, data, it's appreciated if you
could share more detail about your case, then we could have more clear idea
to help you:)

    BTW, "Urgent Call!" is your signature or really urgent? I saw it in
every your thread and wondering about it:-)

    Thank you very much

Luke



Best Regards!
---------------------

Luke Han

On Fri, Jun 19, 2015 at 7:51 AM, Adunuthula, Seshu <sa...@ebay.com>
wrote:

> Sizing & Tuning Hbase requires some skills, but there is a lot of help
> available on the web. Here are some basic principles to begin with.
>
> 1. Do not colocate Hbase Region Servers and MapReduce on the same nodes.
> Shut down the Node Managers on the nodes running the Region Servers. It
> reduces your MR Capacity but makes your Hbase a lot more stable.
> 2. Size your Region Servers correctly. Here is a great blog by Lars on
> this subject.
> https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
> bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
>
> Regards
> Seshu Adunuthula
>
>
> On 6/19/15, 3:12 AM, "Li Yang" <li...@apache.org> wrote:
>
> >In the end, HBase is the bottleneck of the number parallel queries.
> >Because
> >every query will translated into one or more HBase scan. Assuming not much
> >online processing is required (data is pre-aggregated right), the HBase
> >scan will be the bottleneck.
> >
> >On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <sh...@ebay.com> wrote:
> >
> >> Recommend for reading:
> >>
> >> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> >>
> >>
> >> On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com> wrote:
> >>
> >> >Hi,
> >> >
> >> >I was trying Kylin for some of my usecase, where the data cube size is
> >> >110Mb with 5 Million Records, the query for full data takes around a
> >> >minute
> >> >or so which seems to be taking hell lot of time, even apart from this I
> >> >was
> >> >wondering as what is the query threshold that Kylin can handle in
> >> >parallel.
> >> >
> >> >For instance, how many queries can be fired in parallel to our
> >>aggregated
> >> >data cubes and is there some practice which can gain the query
> >> >performance.
> >> >
> >> >Urgent Call!
> >> >
> >> >Thanks!
> >>
> >>
>
>

Re: Kylin Query Latency and Number of Parallel Queries

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.

Sizing & Tuning Hbase requires some skills, but there is a lot of help
available on the web. Here are some basic principles to begin with.

1. Do not colocate Hbase Region Servers and MapReduce on the same nodes.
Shut down the Node Managers on the nodes running the Region Servers. It
reduces your MR Capacity but makes your Hbase a lot more stable.
2. Size your Region Servers correctly. Here is a great blog by Lars on
this subject. 
https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-a
bout-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why

Regards
Seshu Adunuthula

On 6/19/15, 3:12 AM, "Li Yang" <li...@apache.org> wrote:

>In the end, HBase is the bottleneck of the number parallel queries.
>Because
>every query will translated into one or more HBase scan. Assuming not much
>online processing is required (data is pre-aggregated right), the HBase
>scan will be the bottleneck.
>
>On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <sh...@ebay.com> wrote:
>
>> Recommend for reading:
>>
>> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>>
>>
>> On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com> wrote:
>>
>> >Hi,
>> >
>> >I was trying Kylin for some of my usecase, where the data cube size is
>> >110Mb with 5 Million Records, the query for full data takes around a
>> >minute
>> >or so which seems to be taking hell lot of time, even apart from this I
>> >was
>> >wondering as what is the query threshold that Kylin can handle in
>> >parallel.
>> >
>> >For instance, how many queries can be fired in parallel to our
>>aggregated
>> >data cubes and is there some practice which can gain the query
>> >performance.
>> >
>> >Urgent Call!
>> >
>> >Thanks!
>>
>>

Re: Kylin Query Latency and Number of Parallel Queries

Posted by Li Yang <li...@apache.org>.

In the end, HBase is the bottleneck of the number parallel queries. Because
every query will translated into one or more HBase scan. Assuming not much
online processing is required (data is pre-aggregated right), the HBase
scan will be the bottleneck.

On Thu, Jun 11, 2015 at 5:34 PM, Shi, Shaofeng <sh...@ebay.com> wrote:

> Recommend for reading:
>
> http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
>
>
> On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com> wrote:
>
> >Hi,
> >
> >I was trying Kylin for some of my usecase, where the data cube size is
> >110Mb with 5 Million Records, the query for full data takes around a
> >minute
> >or so which seems to be taking hell lot of time, even apart from this I
> >was
> >wondering as what is the query threshold that Kylin can handle in
> >parallel.
> >
> >For instance, how many queries can be fired in parallel to our aggregated
> >data cubes and is there some practice which can gain the query
> >performance.
> >
> >Urgent Call!
> >
> >Thanks!
>
>

Re: Kylin Query Latency and Number of Parallel Queries

Posted by "Shi, Shaofeng" <sh...@ebay.com>.

Recommend for reading:

http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin


On 6/11/15, 4:28 PM, "Vineet Mishra" <cl...@gmail.com> wrote:

>Hi,
>
>I was trying Kylin for some of my usecase, where the data cube size is
>110Mb with 5 Million Records, the query for full data takes around a
>minute
>or so which seems to be taking hell lot of time, even apart from this I
>was
>wondering as what is the query threshold that Kylin can handle in
>parallel.
>
>For instance, how many queries can be fired in parallel to our aggregated
>data cubes and is there some practice which can gain the query
>performance.
>
>Urgent Call!
>
>Thanks!