You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by "Harsch, Tim" <Ti...@Teradata.com> on 2018/08/01 00:04:31 UTC

How to tune Livy for fast queries

I have a Livy application that I'm trying to tune as I'm seeing some performance issue when the queries are fast queries.  I've wrapped my queries with a timer that logs the time taken.  The spark code executed typically takes 50ms to 150ms.  I'm querying Livy every 500ms looking for my response, and generally it doesn't succeed until the third check.   It seems Livy itself is spending up to an extra 1000ms.  Where is Livy spending this time?  Are there any tuning parameters I can adjust?


Also, I am having difficulty changing any of the settings in livy-client.conf.  I placed the file in /etc/hadoop/conf and livy/conf folder but my settings seem to get ignored.


Thanks

Tim

Re: How to tune Livy for fast queries

Posted by "Harsch, Tim" <Ti...@Teradata.com>.
I've looked a little deeper and see now my error, those parameters are for python and java clients (clearly).  I forgot there was clients in the code base.   Just wishful thinking on my part I guess...


In any case, I'm still hoping to understand where Livy overhead on queries is coming from.


________________________________
From: Harsch, Tim <Ti...@Teradata.com>
Sent: Thursday, August 2, 2018 8:28:58 AM
To: user@livy.incubator.apache.org
Subject: Re: How to tune Livy for fast queries


Thank you Saisai for your response.


    I did have a chance to investigate further and I should give a little background on why I feel network cost is not the issue:
    I added to our application Kylo (http://kylo.io) as an optional spark server that is used as a replacement for our existing spark server.  I noticed the performance issues when I use Livy instead of our pre-existing server.  Kylo's spark-shell would consistently execute queries quickly (e.g. <100ms) and the same would take longer (>1500ms) with a 500ms polling (0ms initial query) interval.  This led me to write code that would query Livy quickly in Python (50ms) and wrap the scala code execute in Livy with some timer method that logs to Livy logs the time taken.   I would notice that my faster queries are executing in Livy in <50ms, yet Livy does return the results for at least 350ms (7 queries for results made, 6 returned to client as pending).  I feel fairly confident that Livy has some overhead other than network.


   I've since discovered these settings in livy-client.conf.template

# Initial interval before polling for Job results
# livy.client.http.job.initial-poll-interval = 100ms
# Maximum interval between successive polls
# livy.client.http.job.max-poll-interval = 5s

and I looked at Livy source and noticed it seems it has a geomertic interval for polling
https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266
<https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266>
I'm thinking that could be the source of my issue but I need a chance to dive deeper.  Do you think tuning those parameters could improve the situation?


Thanks,

Tim



________________________________
From: Saisai Shao <sa...@gmail.com>
Sent: Wednesday, August 1, 2018 7:23:55 PM
To: user@livy.incubator.apache.org
Subject: Re: How to tune Livy for fast queries

[External Email]
________________________________
Probably some network cost should also be counted in. There's no such configuration for tuning. If you find some performance issue, you can create a JIRA or even a patch to fix Livy.

Harsch, Tim <Ti...@teradata.com>> 于2018年8月1日周三 上午8:04写道:

I have a Livy application that I'm trying to tune as I'm seeing some performance issue when the queries are fast queries.  I've wrapped my queries with a timer that logs the time taken.  The spark code executed typically takes 50ms to 150ms.  I'm querying Livy every 500ms looking for my response, and generally it doesn't succeed until the third check.   It seems Livy itself is spending up to an extra 1000ms.  Where is Livy spending this time?  Are there any tuning parameters I can adjust?


Also, I am having difficulty changing any of the settings in livy-client.conf.  I placed the file in /etc/hadoop/conf and livy/conf folder but my settings seem to get ignored.


Thanks

Tim

Re: How to tune Livy for fast queries

Posted by "Harsch, Tim" <Ti...@Teradata.com>.
Thank you Saisai for your response.


    I did have a chance to investigate further and I should give a little background on why I feel network cost is not the issue:
    I added to our application Kylo (http://kylo.io) as an optional spark server that is used as a replacement for our existing spark server.  I noticed the performance issues when I use Livy instead of our pre-existing server.  Kylo's spark-shell would consistently execute queries quickly (e.g. <100ms) and the same would take longer (>1500ms) with a 500ms polling (0ms initial query) interval.  This led me to write code that would query Livy quickly in Python (50ms) and wrap the scala code execute in Livy with some timer method that logs to Livy logs the time taken.   I would notice that my faster queries are executing in Livy in <50ms, yet Livy does return the results for at least 350ms (7 queries for results made, 6 returned to client as pending).  I feel fairly confident that Livy has some overhead other than network.


   I've since discovered these settings in livy-client.conf.template

# Initial interval before polling for Job results
# livy.client.http.job.initial-poll-interval = 100ms
# Maximum interval between successive polls
# livy.client.http.job.max-poll-interval = 5s

and I looked at Livy source and noticed it seems it has a geomertic interval for polling
https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266
<https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266>
I'm thinking that could be the source of my issue but I need a chance to dive deeper.  Do you think tuning those parameters could improve the situation?


Thanks,

Tim



________________________________
From: Saisai Shao <sa...@gmail.com>
Sent: Wednesday, August 1, 2018 7:23:55 PM
To: user@livy.incubator.apache.org
Subject: Re: How to tune Livy for fast queries

[External Email]
________________________________
Probably some network cost should also be counted in. There's no such configuration for tuning. If you find some performance issue, you can create a JIRA or even a patch to fix Livy.

Harsch, Tim <Ti...@teradata.com>> 于2018年8月1日周三 上午8:04写道:

I have a Livy application that I'm trying to tune as I'm seeing some performance issue when the queries are fast queries.  I've wrapped my queries with a timer that logs the time taken.  The spark code executed typically takes 50ms to 150ms.  I'm querying Livy every 500ms looking for my response, and generally it doesn't succeed until the third check.   It seems Livy itself is spending up to an extra 1000ms.  Where is Livy spending this time?  Are there any tuning parameters I can adjust?


Also, I am having difficulty changing any of the settings in livy-client.conf.  I placed the file in /etc/hadoop/conf and livy/conf folder but my settings seem to get ignored.


Thanks

Tim

Re: How to tune Livy for fast queries

Posted by Saisai Shao <sa...@gmail.com>.
Probably some network cost should also be counted in. There's no such
configuration for tuning. If you find some performance issue, you can
create a JIRA or even a patch to fix Livy.

Harsch, Tim <Ti...@teradata.com> 于2018年8月1日周三 上午8:04写道:

>
> I have a Livy application that I'm trying to tune as I'm seeing some
> performance issue when the queries are fast queries.  I've wrapped my
> queries with a timer that logs the time taken.  The spark code
> executed typically takes 50ms to 150ms.  I'm querying Livy every 500ms
> looking for my response, and generally it doesn't succeed until the third
> check.   It seems Livy itself is spending up to an extra 1000ms.  Where is
> Livy spending this time?  Are there any tuning parameters I can adjust?
>
>
> Also, I am having difficulty changing any of the settings in
> livy-client.conf.  I placed the file in /etc/hadoop/conf and livy/conf
> folder but my settings seem to get ignored.
>
>
> Thanks
>
> Tim
>