You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2018/09/29 12:04:38 UTC

Long running impala query

Hi Community.

I'm investigating one of my impala queries that running for 18 minutes at
specific time and when i run it as ad hoc at different times it only runs
for few seconds.

 Looked at the cluser metrics,the cluster resources and impala metrics and
don't see anything our of the reqular load.

 In the query profile i see that one of the exchange steps of the data
between the nodes took most of the query time,

 When i run the query as ad hoc now the query is taking few seconds to
finish, so i'm intersting to understand what can cause such issue
specifically on the cluster as general that may cause this.

 I'm thinking to change the query schedule time but i prefer to understand
the root cause so i can avoid this in the next times.

 Below is the query metrics and attached the query profile.

 Thanks in advance.

Duration: 18.7m
Rows Produced: 2822780
Aggregate Peak Memory Usage: 683.4 MiB
Bytes Streamed: 1 GiB
HDFS Average Scan Range: 752 KiB
HDFS Bytes Read: 1 GiB
HDFS Bytes Read From Cache: 0 B
Threads: CPU Time: 89.44s
Threads: Network Receive Wait Time: 11h
Threads: Network Receive Wait Time Percentage: 39
Threads: Network Send Wait Time:
The sum of the time spent waiting to send data over the network by all
threads of the query. Called 'thread_network_send_wait_time' in searches.
 16.9h
 Threads: Network Send Wait Time Percentage: 60
 Threads: Storage Wait Time: 4.3m
 Threads: Storage Wait Time Percentage: 0






-- 
Take Care
Fawze Abujaber

Re: Long running impala query

Posted by Fawze Abujaber <fa...@gmail.com>.
Thanks Jezy the very detailed answer.

Yes I had set  idle session timeout and idle query timeout for 10 minutes
but I still getting this query to run for 18 minutes.

I my case I’m running one SQL query by session, this query load the data to
MSTR cube.

I’m weird now more about these configuration and how they may impact my
data load, i may didn’t get the point very well, but will be happy if to
understand if closing the session or the query using the timeout will
result in partially loaded data to the cube?



On Mon, 1 Oct 2018 at 10:37 Jeszy <je...@gmail.com> wrote:

> Yes, First row fetched was the key, along with 'ClientFetchWaitTimer'.
> Unfortunately it's a bit more complicated than that. Impala won't release
> the resources after first row is fetched since it doesn't buffer results
> internally - meaning if you tear down the query, you won't be able to serve
> future client fetches. The parameter 'idle_query_timeout' is used to
> control how long should Impala wait for the client to fetch before moving
> the query to the 'cancelled' state.
> This gets a bit more shaky in your version. The 'cancelled' state is
> different from the 'closed' state, and in 2.10 Impala will hold on to some
> of the backend resources even when cancelled, only releasing everything
> when the query is actually 'closed'. If you want to force query closure,
> and not just cancellation, the setting is idle_session_timeout - closing
> the session will close all associated queries. This option works well if
> the users submit a single query at a time from a single session (which is
> not always the case) - otherwise, even though query A has been idle for a
> long time, actions on query B in the same session can prevent query A from
> closing (though it will be cancelled, if idle_query_timeout is set).
>
> The good news is that all of this is much simpler in versions having
> IMPALA-1575, which is 2.11+. In those versions, cancelling the query will
> release all backend resources and just keep some state on the coordinator
> so the client has a chance to know what's going on. This way the
> 'idle_query_timeout' option works like a charm. If you're having trouble
> with this, I'd strongly recommend upgrading to 2.11+.
>
> Cheers
>
> On 1 October 2018 at 09:10, Fawze Abujaber <fa...@gmail.com> wrote:
>
>> Hi Jezy,
>>
>> Soooo appreciating your response, i leaned something new :)
>>
>> Quick questions? does this mean that impala realised the resources after
>> 2 seconds or it will onhold the resources for the 18 minutes, indeed yes we
>> are running this query through MSTR, i will look for a solution to this.
>>
>> And i understand that you looked at First row fetched: 1.79s
>> (1793452923) which telling us it finished in less than 2 seconds, right?
>>
>>
>>
>> On Mon, Oct 1, 2018 at 10:01 AM Jeszy <je...@gmail.com> wrote:
>>
>>> Hey Fawze,
>>>
>>> Your profile has:
>>>
>>>     Query Timeline
>>>       Query submitted: 115.23us (115234)
>>>       Planning finished: 11ms (11653012)
>>>       Ready to start on 51 backends: 22ms (22019435)
>>>       All 51 execution backends (163 fragment instances) started: 62ms (62790505)
>>>       First dynamic filter received: 428ms (428128089)
>>>       Rows available: 1.79s (1793346081)
>>>       First row fetched: 1.79s (1793452923)
>>>       Unregister query: 18.7m (1124881163639)
>>>   ImpalaServer
>>>     - ClientFetchWaitTimer: 18.5m (1108994435686)
>>>     - InactiveTotalTime: 0ns (0)
>>>     - RowMaterializationTimer: 14.03s (14027078174)
>>>
>>> This means that Impala was ready to return rows less than 2 seconds
>>> after the query was submitted, but whoever submitted the query didn't want
>>> to look at the results for some time. This often happens with tools like
>>> Hue, which page through the results on demand versus fetching them all at
>>> once.
>>>
>>> HTH
>>>
>>>
>>> On 29 September 2018 at 14:04, Fawze Abujaber <fa...@gmail.com> wrote:
>>>
>>>> Hi Community.
>>>>
>>>> I'm investigating one of my impala queries that running for 18 minutes
>>>> at specific time and when i run it as ad hoc at different times it only
>>>> runs for few seconds.
>>>>
>>>>  Looked at the cluser metrics,the cluster resources and impala metrics
>>>> and don't see anything our of the reqular load.
>>>>
>>>>  In the query profile i see that one of the exchange steps of the data
>>>> between the nodes took most of the query time,
>>>>
>>>>  When i run the query as ad hoc now the query is taking few seconds to
>>>> finish, so i'm intersting to understand what can cause such issue
>>>> specifically on the cluster as general that may cause this.
>>>>
>>>>  I'm thinking to change the query schedule time but i prefer to
>>>> understand the root cause so i can avoid this in the next times.
>>>>
>>>>  Below is the query metrics and attached the query profile.
>>>>
>>>>  Thanks in advance.
>>>>
>>>> Duration: 18.7m
>>>> Rows Produced: 2822780
>>>> Aggregate Peak Memory Usage: 683.4 MiB
>>>> Bytes Streamed: 1 GiB
>>>> HDFS Average Scan Range: 752 KiB
>>>> HDFS Bytes Read: 1 GiB
>>>> HDFS Bytes Read From Cache: 0 B
>>>> Threads: CPU Time: 89.44s
>>>> Threads: Network Receive Wait Time: 11h
>>>> Threads: Network Receive Wait Time Percentage: 39
>>>> Threads: Network Send Wait Time:
>>>> The sum of the time spent waiting to send data over the network by all
>>>> threads of the query. Called 'thread_network_send_wait_time' in searches.
>>>>  16.9h
>>>>  Threads: Network Send Wait Time Percentage: 60
>>>>  Threads: Storage Wait Time: 4.3m
>>>>  Threads: Storage Wait Time Percentage: 0
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Take Care
>>>> Fawze Abujaber
>>>>
>>>
>>>
>>
>> --
>> Take Care
>> Fawze Abujaber
>>
>
> --
Take Care
Fawze Abujaber

Re: Long running impala query

Posted by Jeszy <je...@gmail.com>.
Yes, First row fetched was the key, along with 'ClientFetchWaitTimer'.
Unfortunately it's a bit more complicated than that. Impala won't release
the resources after first row is fetched since it doesn't buffer results
internally - meaning if you tear down the query, you won't be able to serve
future client fetches. The parameter 'idle_query_timeout' is used to
control how long should Impala wait for the client to fetch before moving
the query to the 'cancelled' state.
This gets a bit more shaky in your version. The 'cancelled' state is
different from the 'closed' state, and in 2.10 Impala will hold on to some
of the backend resources even when cancelled, only releasing everything
when the query is actually 'closed'. If you want to force query closure,
and not just cancellation, the setting is idle_session_timeout - closing
the session will close all associated queries. This option works well if
the users submit a single query at a time from a single session (which is
not always the case) - otherwise, even though query A has been idle for a
long time, actions on query B in the same session can prevent query A from
closing (though it will be cancelled, if idle_query_timeout is set).

The good news is that all of this is much simpler in versions having
IMPALA-1575, which is 2.11+. In those versions, cancelling the query will
release all backend resources and just keep some state on the coordinator
so the client has a chance to know what's going on. This way the
'idle_query_timeout' option works like a charm. If you're having trouble
with this, I'd strongly recommend upgrading to 2.11+.

Cheers

On 1 October 2018 at 09:10, Fawze Abujaber <fa...@gmail.com> wrote:

> Hi Jezy,
>
> Soooo appreciating your response, i leaned something new :)
>
> Quick questions? does this mean that impala realised the resources after 2
> seconds or it will onhold the resources for the 18 minutes, indeed yes we
> are running this query through MSTR, i will look for a solution to this.
>
> And i understand that you looked at First row fetched: 1.79s (1793452923)
> which telling us it finished in less than 2 seconds, right?
>
>
>
> On Mon, Oct 1, 2018 at 10:01 AM Jeszy <je...@gmail.com> wrote:
>
>> Hey Fawze,
>>
>> Your profile has:
>>
>>     Query Timeline
>>       Query submitted: 115.23us (115234)
>>       Planning finished: 11ms (11653012)
>>       Ready to start on 51 backends: 22ms (22019435)
>>       All 51 execution backends (163 fragment instances) started: 62ms (62790505)
>>       First dynamic filter received: 428ms (428128089)
>>       Rows available: 1.79s (1793346081)
>>       First row fetched: 1.79s (1793452923)
>>       Unregister query: 18.7m (1124881163639)
>>   ImpalaServer
>>     - ClientFetchWaitTimer: 18.5m (1108994435686)
>>     - InactiveTotalTime: 0ns (0)
>>     - RowMaterializationTimer: 14.03s (14027078174)
>>
>> This means that Impala was ready to return rows less than 2 seconds after
>> the query was submitted, but whoever submitted the query didn't want to
>> look at the results for some time. This often happens with tools like Hue,
>> which page through the results on demand versus fetching them all at once.
>>
>> HTH
>>
>>
>> On 29 September 2018 at 14:04, Fawze Abujaber <fa...@gmail.com> wrote:
>>
>>> Hi Community.
>>>
>>> I'm investigating one of my impala queries that running for 18 minutes
>>> at specific time and when i run it as ad hoc at different times it only
>>> runs for few seconds.
>>>
>>>  Looked at the cluser metrics,the cluster resources and impala metrics
>>> and don't see anything our of the reqular load.
>>>
>>>  In the query profile i see that one of the exchange steps of the data
>>> between the nodes took most of the query time,
>>>
>>>  When i run the query as ad hoc now the query is taking few seconds to
>>> finish, so i'm intersting to understand what can cause such issue
>>> specifically on the cluster as general that may cause this.
>>>
>>>  I'm thinking to change the query schedule time but i prefer to
>>> understand the root cause so i can avoid this in the next times.
>>>
>>>  Below is the query metrics and attached the query profile.
>>>
>>>  Thanks in advance.
>>>
>>> Duration: 18.7m
>>> Rows Produced: 2822780
>>> Aggregate Peak Memory Usage: 683.4 MiB
>>> Bytes Streamed: 1 GiB
>>> HDFS Average Scan Range: 752 KiB
>>> HDFS Bytes Read: 1 GiB
>>> HDFS Bytes Read From Cache: 0 B
>>> Threads: CPU Time: 89.44s
>>> Threads: Network Receive Wait Time: 11h
>>> Threads: Network Receive Wait Time Percentage: 39
>>> Threads: Network Send Wait Time:
>>> The sum of the time spent waiting to send data over the network by all
>>> threads of the query. Called 'thread_network_send_wait_time' in
>>> searches.
>>>  16.9h
>>>  Threads: Network Send Wait Time Percentage: 60
>>>  Threads: Storage Wait Time: 4.3m
>>>  Threads: Storage Wait Time Percentage: 0
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Take Care
>>> Fawze Abujaber
>>>
>>
>>
>
> --
> Take Care
> Fawze Abujaber
>

Re: Long running impala query

Posted by Fawze Abujaber <fa...@gmail.com>.
Hi Jezy,

Soooo appreciating your response, i leaned something new :)

Quick questions? does this mean that impala realised the resources after 2
seconds or it will onhold the resources for the 18 minutes, indeed yes we
are running this query through MSTR, i will look for a solution to this.

And i understand that you looked at First row fetched: 1.79s (1793452923)
which telling us it finished in less than 2 seconds, right?



On Mon, Oct 1, 2018 at 10:01 AM Jeszy <je...@gmail.com> wrote:

> Hey Fawze,
>
> Your profile has:
>
>     Query Timeline
>       Query submitted: 115.23us (115234)
>       Planning finished: 11ms (11653012)
>       Ready to start on 51 backends: 22ms (22019435)
>       All 51 execution backends (163 fragment instances) started: 62ms (62790505)
>       First dynamic filter received: 428ms (428128089)
>       Rows available: 1.79s (1793346081)
>       First row fetched: 1.79s (1793452923)
>       Unregister query: 18.7m (1124881163639)
>   ImpalaServer
>     - ClientFetchWaitTimer: 18.5m (1108994435686)
>     - InactiveTotalTime: 0ns (0)
>     - RowMaterializationTimer: 14.03s (14027078174)
>
> This means that Impala was ready to return rows less than 2 seconds after
> the query was submitted, but whoever submitted the query didn't want to
> look at the results for some time. This often happens with tools like Hue,
> which page through the results on demand versus fetching them all at once.
>
> HTH
>
>
> On 29 September 2018 at 14:04, Fawze Abujaber <fa...@gmail.com> wrote:
>
>> Hi Community.
>>
>> I'm investigating one of my impala queries that running for 18 minutes at
>> specific time and when i run it as ad hoc at different times it only runs
>> for few seconds.
>>
>>  Looked at the cluser metrics,the cluster resources and impala metrics
>> and don't see anything our of the reqular load.
>>
>>  In the query profile i see that one of the exchange steps of the data
>> between the nodes took most of the query time,
>>
>>  When i run the query as ad hoc now the query is taking few seconds to
>> finish, so i'm intersting to understand what can cause such issue
>> specifically on the cluster as general that may cause this.
>>
>>  I'm thinking to change the query schedule time but i prefer to
>> understand the root cause so i can avoid this in the next times.
>>
>>  Below is the query metrics and attached the query profile.
>>
>>  Thanks in advance.
>>
>> Duration: 18.7m
>> Rows Produced: 2822780
>> Aggregate Peak Memory Usage: 683.4 MiB
>> Bytes Streamed: 1 GiB
>> HDFS Average Scan Range: 752 KiB
>> HDFS Bytes Read: 1 GiB
>> HDFS Bytes Read From Cache: 0 B
>> Threads: CPU Time: 89.44s
>> Threads: Network Receive Wait Time: 11h
>> Threads: Network Receive Wait Time Percentage: 39
>> Threads: Network Send Wait Time:
>> The sum of the time spent waiting to send data over the network by all
>> threads of the query. Called 'thread_network_send_wait_time' in searches.
>>  16.9h
>>  Threads: Network Send Wait Time Percentage: 60
>>  Threads: Storage Wait Time: 4.3m
>>  Threads: Storage Wait Time Percentage: 0
>>
>>
>>
>>
>>
>>
>> --
>> Take Care
>> Fawze Abujaber
>>
>
>

-- 
Take Care
Fawze Abujaber

Re: Long running impala query

Posted by Jeszy <je...@gmail.com>.
Hey Fawze,

Your profile has:

    Query Timeline
      Query submitted: 115.23us (115234)
      Planning finished: 11ms (11653012)
      Ready to start on 51 backends: 22ms (22019435)
      All 51 execution backends (163 fragment instances) started: 62ms
(62790505)
      First dynamic filter received: 428ms (428128089)
      Rows available: 1.79s (1793346081)
      First row fetched: 1.79s (1793452923)
      Unregister query: 18.7m (1124881163639)
  ImpalaServer
    - ClientFetchWaitTimer: 18.5m (1108994435686)
    - InactiveTotalTime: 0ns (0)
    - RowMaterializationTimer: 14.03s (14027078174)

This means that Impala was ready to return rows less than 2 seconds after
the query was submitted, but whoever submitted the query didn't want to
look at the results for some time. This often happens with tools like Hue,
which page through the results on demand versus fetching them all at once.

HTH


On 29 September 2018 at 14:04, Fawze Abujaber <fa...@gmail.com> wrote:

> Hi Community.
>
> I'm investigating one of my impala queries that running for 18 minutes at
> specific time and when i run it as ad hoc at different times it only runs
> for few seconds.
>
>  Looked at the cluser metrics,the cluster resources and impala metrics and
> don't see anything our of the reqular load.
>
>  In the query profile i see that one of the exchange steps of the data
> between the nodes took most of the query time,
>
>  When i run the query as ad hoc now the query is taking few seconds to
> finish, so i'm intersting to understand what can cause such issue
> specifically on the cluster as general that may cause this.
>
>  I'm thinking to change the query schedule time but i prefer to understand
> the root cause so i can avoid this in the next times.
>
>  Below is the query metrics and attached the query profile.
>
>  Thanks in advance.
>
> Duration: 18.7m
> Rows Produced: 2822780
> Aggregate Peak Memory Usage: 683.4 MiB
> Bytes Streamed: 1 GiB
> HDFS Average Scan Range: 752 KiB
> HDFS Bytes Read: 1 GiB
> HDFS Bytes Read From Cache: 0 B
> Threads: CPU Time: 89.44s
> Threads: Network Receive Wait Time: 11h
> Threads: Network Receive Wait Time Percentage: 39
> Threads: Network Send Wait Time:
> The sum of the time spent waiting to send data over the network by all
> threads of the query. Called 'thread_network_send_wait_time' in searches.
>  16.9h
>  Threads: Network Send Wait Time Percentage: 60
>  Threads: Storage Wait Time: 4.3m
>  Threads: Storage Wait Time Percentage: 0
>
>
>
>
>
>
> --
> Take Care
> Fawze Abujaber
>