You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Kunal Khatua <kk...@mapr.com> on 2017/06/02 18:29:49 UTC

Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: jasbir.sing@accenture.com <ja...@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: maneesh.kothari@accenture.com; nitin.a.sareen@accenture.com; h.p.kumar@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:agirish@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: user@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.


On Thu, Jun 1, 2017 at 10:18 PM, <ja...@accenture.com> wrote:

> Hi,
>
>
>
> I am running a simple query which performs JOIN operation between two
> parquet files and it takes around 3-4 secs and I noticed that 70% of
> the time is used by UNORDERED_RECEIVER.
>
>
>
> Sample query is –
>
>
>
> select sum(sales),week from dfs.`C:\parquet-location\
> F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
> where model_component_id in(
>
> select model_component_id from
> dfs.`C:\parquet-location\poc48k.parquet`)
> group by week
>
>
>
>
>
> Can we somehow reduce unordered receiver time?
>
>
>
> Please find the below screenshot of Visualized plan
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Where allowed by local law, electronic communications with Accenture
> and its affiliates, including e-mail and instant messaging (including
> content), may be scanned by our systems for the purposes of
> information security and assessment of internal compliance with Accenture policy.
> ____________________________________________________________
> __________________________
>
> www.accenture.com<http://www.accenture.com>
>

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Posted by Kunal Khatua <kk...@mapr.com>.
I suspect you are running this probably on a laptop or something that has a small number of cores. (2 or 4 perhaps)? That would explain the


You need to try and reduce the parallelization, but looking at the profile, you're already pretty low.


If you are using Drill 1.9+ ; the Async Parquet Reader can be disabled (or tweaked) to reduce the number of active threads. This might put less strain on your system.


Kunal

<http://www.mapr.com/>

________________________________
From: jasbir.sing@accenture.com <ja...@accenture.com>
Sent: Saturday, June 3, 2017 5:51:20 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: maneesh.kothari@accenture.com; nitin.a.sareen@accenture.com; h.p.kumar@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Thanks Kunal.

Query that I am running on Apache Drill is -

select sum(sales), week from dfs.`C:\parquet-location\F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where model_component_id in(select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week

And record count of my 2 parquet files are like -

F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet - 4,000,000(approx) and size of parquet is 7 MB
poc48k.parquet - 48000 (approx) and size is 1.68 MB

from query profile, I could see that its PARQUET_ROW_GROUP_SCAN which is taking most of the % query time. With this record size, it is expected to take so much time or is there any way by which we can try reducing it?

Regards,
Jasbir singh





-----Original Message-----
From: Kunal Khatua [mailto:kkhatua@mapr.com]
Sent: Saturday, June 03, 2017 12:00 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: Kothari, Maneesh <ma...@accenture.com>; Sareen, Nitin A. <ni...@accenture.com>; Kumar, H. P. <h....@accenture.com>
Subject: Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: jasbir.sing@accenture.com <ja...@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: maneesh.kothari@accenture.com; nitin.a.sareen@accenture.com; h.p.kumar@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:agirish@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: user@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.


On Thu, Jun 1, 2017 at 10:18 PM, <ja...@accenture.com> wrote:

> Hi,
>
>
>
> I am running a simple query which performs JOIN operation between two
> parquet files and it takes around 3-4 secs and I noticed that 70% of
> the time is used by UNORDERED_RECEIVER.
>
>
>
> Sample query is -
>
>
>
> select sum(sales),week from dfs.`C:\parquet-location\
> F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
> where model_component_id in(
>
> select model_component_id from
> dfs.`C:\parquet-location\poc48k.parquet`)
> group by week
>
>
>
>
>
> Can we somehow reduce unordered receiver time?
>
>
>
> Please find the below screenshot of Visualized plan
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Where allowed by local law, electronic communications with Accenture
> and its affiliates, including e-mail and instant messaging (including
> content), may be scanned by our systems for the purposes of
> information security and assessment of internal compliance with Accenture policy.
> ____________________________________________________________
> __________________________
>
> www.accenture.com<http://www.accenture.com>
>

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Posted by Kunal Khatua <kk...@mapr.com>.
I suspect you are running this probably on a laptop or something that has a small number of cores. (2 or 4 perhaps)? That would explain the


You need to try and reduce the parallelization, but looking at the profile, you're already pretty low.


If you are using Drill 1.9+ ; the Async Parquet Reader can be disabled (or tweaked) to reduce the number of active threads. This might put less strain on your system.


Kunal

<http://www.mapr.com/>

________________________________
From: jasbir.sing@accenture.com <ja...@accenture.com>
Sent: Saturday, June 3, 2017 5:51:20 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: maneesh.kothari@accenture.com; nitin.a.sareen@accenture.com; h.p.kumar@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Thanks Kunal.

Query that I am running on Apache Drill is -

select sum(sales), week from dfs.`C:\parquet-location\F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where model_component_id in(select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week

And record count of my 2 parquet files are like -

F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet - 4,000,000(approx) and size of parquet is 7 MB
poc48k.parquet - 48000 (approx) and size is 1.68 MB

from query profile, I could see that its PARQUET_ROW_GROUP_SCAN which is taking most of the % query time. With this record size, it is expected to take so much time or is there any way by which we can try reducing it?

Regards,
Jasbir singh





-----Original Message-----
From: Kunal Khatua [mailto:kkhatua@mapr.com]
Sent: Saturday, June 03, 2017 12:00 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: Kothari, Maneesh <ma...@accenture.com>; Sareen, Nitin A. <ni...@accenture.com>; Kumar, H. P. <h....@accenture.com>
Subject: Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: jasbir.sing@accenture.com <ja...@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: maneesh.kothari@accenture.com; nitin.a.sareen@accenture.com; h.p.kumar@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:agirish@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: user@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.


On Thu, Jun 1, 2017 at 10:18 PM, <ja...@accenture.com> wrote:

> Hi,
>
>
>
> I am running a simple query which performs JOIN operation between two
> parquet files and it takes around 3-4 secs and I noticed that 70% of
> the time is used by UNORDERED_RECEIVER.
>
>
>
> Sample query is -
>
>
>
> select sum(sales),week from dfs.`C:\parquet-location\
> F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
> where model_component_id in(
>
> select model_component_id from
> dfs.`C:\parquet-location\poc48k.parquet`)
> group by week
>
>
>
>
>
> Can we somehow reduce unordered receiver time?
>
>
>
> Please find the below screenshot of Visualized plan
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you
> have received it in error, please notify the sender immediately and
> delete the original. Any other use of the e-mail by you is prohibited.
> Where allowed by local law, electronic communications with Accenture
> and its affiliates, including e-mail and instant messaging (including
> content), may be scanned by our systems for the purposes of
> information security and assessment of internal compliance with Accenture policy.
> ____________________________________________________________
> __________________________
>
> www.accenture.com<http://www.accenture.com>
>

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>

RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Posted by ja...@accenture.com.
Thanks Kunal.

Query that I am running on Apache Drill is - 

select sum(sales), week from dfs.`C:\parquet-location\F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet` where model_component_id in(select model_component_id from dfs.`C:\parquet-location\poc48k.parquet`) group by week

And record count of my 2 parquet files are like - 

F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet - 4,000,000(approx) and size of parquet is 7 MB
poc48k.parquet - 48000 (approx) and size is 1.68 MB

from query profile, I could see that its PARQUET_ROW_GROUP_SCAN which is taking most of the % query time. With this record size, it is expected to take so much time or is there any way by which we can try reducing it?

Regards,
Jasbir singh





-----Original Message-----
From: Kunal Khatua [mailto:kkhatua@mapr.com] 
Sent: Saturday, June 03, 2017 12:00 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: Kothari, Maneesh <ma...@accenture.com>; Sareen, Nitin A. <ni...@accenture.com>; Kumar, H. P. <h....@accenture.com>
Subject: Re: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi Jasbir


I don't think the Apache mailing lists allows you to send attachments, except may be text files. (The txt file made it through).


In your Operator Profile, you'll see two columns... %Fragment Time and %QueryTime

Taking your mouse over those table headers should show you a description of the two.


%Fragment time is the fraction of time spent by threads of that Major Fragment for a specific operator. This simply means which operator did the threads of a major fragment spend most time on.


%QueryTime is teh fraction of time spent by the threads of ALL the Major fragments for a specific operator. This simply means which operator, implicitly, consumed the most CPU resources.


From the latter, it appears that the HashJoin (03-xx-04) and Parquet Scan (03-xx-06) are the biggest bottlenecks. THe unordered receiver is not the bottleneck in the query.


~ Kunal



________________________________
From: jasbir.sing@accenture.com <ja...@accenture.com>
Sent: Friday, June 2, 2017 12:13:21 AM
To: user@drill.apache.org; dev@drill.apache.org
Cc: maneesh.kothari@accenture.com; nitin.a.sareen@accenture.com; h.p.kumar@accenture.com
Subject: RE: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Hi,

Please find the attached query profile.

I am running Drill in local mode on my laptop with default memory allocation to Apache Drill.

Let me know if you are not able to find the attachment.

Also, sending the file in RAR format.

Regards,
Jasbir Singh


-----Original Message-----
From: Abhishek Girish [mailto:agirish@apache.org]
Sent: Friday, June 02, 2017 11:00 AM
To: user@drill.apache.org
Subject: [External] Re: UNORDERED_RECEIVER taking 70% of query time

Attachment hasn't come through. Can you upload the query profile to some cloud storage and share a link to it?

Also, please share details on how large your dataset is, number of Drillbits, memory and other configurations.


On Thu, Jun 1, 2017 at 10:18 PM, <ja...@accenture.com> wrote:

> Hi,
>
>
>
> I am running a simple query which performs JOIN operation between two 
> parquet files and it takes around 3-4 secs and I noticed that 70% of 
> the time is used by UNORDERED_RECEIVER.
>
>
>
> Sample query is -
>
>
>
> select sum(sales),week from dfs.`C:\parquet-location\ 
> F8894180-AFFB-4803-B8CF-CCF883AA5AAF-Search_Snapshot_Data.parquet`
> where model_component_id in(
>
> select model_component_id from
> dfs.`C:\parquet-location\poc48k.parquet`)
> group by week
>
>
>
>
>
> Can we somehow reduce unordered receiver time?
>
>
>
> Please find the below screenshot of Visualized plan
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain 
> privileged, proprietary, or otherwise confidential information. If you 
> have received it in error, please notify the sender immediately and 
> delete the original. Any other use of the e-mail by you is prohibited.
> Where allowed by local law, electronic communications with Accenture 
> and its affiliates, including e-mail and instant messaging (including 
> content), may be scanned by our systems for the purposes of 
> information security and assessment of internal compliance with Accenture policy.
> ____________________________________________________________
> __________________________
>
> www.accenture.com<http://www.accenture.com>
>

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com<http://www.accenture.com>