You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Tim Armstrong <ta...@cloudera.com> on 2018/01/03 16:20:39 UTC
Re: Could anybody help to explain why there is such a big gap in
'ProbeTime' and How to fix this gap?
That's a tricky one. I have a couple of ideas but it's a bit difficult to
confirm since the profile isn't really designed to easily answer questions
like this. ProbeTime measures wall-clock time rather than actual time spent
executing on the CPU.
My first guess is that it's because the Kudu scan is using more CPU than
the Parquet scan and ProbeTime so the thread doing the scan is competing
for resources more with the join thread (either hyperthreads competing for
resources on the same CPU or threads competing for time on logical
processors). You could compare the User CPU time for the fragment instances
containing the joins and scans to see if there is also a discrepancy in CPU
time. That won't answer this directly but might provide some clues.
My second guess is that there's some kind of subtle memory locality or
scheduling effect.
On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
> Hi everybody,
>
>
>
> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql
> independently:
>
> select count(*) from lineitem as l, orders as o where l.l_orderkey =
> o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >=
> '1995-01-01';
>
>
>
> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on
> ‘parquet’!!
>
>
>
> Below are the plans and profiles:
>
>
>
> Thanks in advance.
>
>
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>
Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?
Posted by Mostafa Mokhtar <mm...@cloudera.com>.
Can you please attach the full query profiles?
Thanks
Mostafa
> On Jan 3, 2018, at 8:20 AM, Tim Armstrong <ta...@cloudera.com> wrote:
>
> That's a tricky one. I have a couple of ideas but it's a bit difficult to confirm since the profile isn't really designed to easily answer questions like this. ProbeTime measures wall-clock time rather than actual time spent executing on the CPU.
>
> My first guess is that it's because the Kudu scan is using more CPU than the Parquet scan and ProbeTime so the thread doing the scan is competing for resources more with the join thread (either hyperthreads competing for resources on the same CPU or threads competing for time on logical processors). You could compare the User CPU time for the fragment instances containing the joins and scans to see if there is also a discrepancy in CPU time. That won't answer this directly but might provide some clues.
>
> My second guess is that there's some kind of subtle memory locality or scheduling effect.
>
>> On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
>> Hi everybody,
>>
>>
>>
>> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql independently:
>>
>> select count(*) from lineitem as l, orders as o where l.l_orderkey = o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >= '1995-01-01';
>>
>>
>>
>> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on ‘parquet’!!
>>
>>
>>
>>
>>
>>
>>
>> Below are the plans and profiles:
>>
>>
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>>
>>
>> 何李夫
>>
>> 2017-04-10 16:06:24
>>
>>
>>
>
Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?
Posted by Mostafa Mokhtar <mm...@cloudera.com>.
Can you please attach the full query profiles?
Thanks
Mostafa
> On Jan 3, 2018, at 8:20 AM, Tim Armstrong <ta...@cloudera.com> wrote:
>
> That's a tricky one. I have a couple of ideas but it's a bit difficult to confirm since the profile isn't really designed to easily answer questions like this. ProbeTime measures wall-clock time rather than actual time spent executing on the CPU.
>
> My first guess is that it's because the Kudu scan is using more CPU than the Parquet scan and ProbeTime so the thread doing the scan is competing for resources more with the join thread (either hyperthreads competing for resources on the same CPU or threads competing for time on logical processors). You could compare the User CPU time for the fragment instances containing the joins and scans to see if there is also a discrepancy in CPU time. That won't answer this directly but might provide some clues.
>
> My second guess is that there's some kind of subtle memory locality or scheduling effect.
>
>> On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
>> Hi everybody,
>>
>>
>>
>> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql independently:
>>
>> select count(*) from lineitem as l, orders as o where l.l_orderkey = o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >= '1995-01-01';
>>
>>
>>
>> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on ‘parquet’!!
>>
>>
>>
>>
>>
>>
>>
>> Below are the plans and profiles:
>>
>>
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>>
>>
>> 何李夫
>>
>> 2017-04-10 16:06:24
>>
>>
>>
>
Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?
Posted by "helifu@gmail.com" <he...@gmail.com>.
The cpu and memory are sufficient. Only one core is 100% during PHJ:
----->top + H
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
146455 impala 20 0 34.534g 2.130g 80992 R 99.9 0.8 0:05.94 impalad
51161 kudu 20 0 50.041g 0.021t 36788 S 4.9 8.6 672:05.66 rpc reactor-511
51177 kudu 20 0 50.041g 0.021t 36788 S 4.9 8.6 0:37.49 rpc worker-5117
51163 kudu 20 0 50.041g 0.021t 36788 S 4.6 8.6 417:25.94 rpc reactor-511
51185 kudu 20 0 50.041g 0.021t 36788 S 3.9 8.6 0:03.43 rpc worker-5118
51337 kudu 20 0 50.041g 0.021t 36788 S 3.9 8.6 417:15.66 maintenance_sch
114193 impala 20 0 34.534g 2.130g 80992 S 3.9 0.8 0:03.96 rpc reactor-114
51191 kudu 20 0 50.041g 0.021t 36788 S 3.6 8.6 0:20.62 rpc worker-5119
51190 kudu 20 0 50.041g 0.021t 36788 S 2.9 8.6 0:05.81 rpc worker-5119
51162 kudu 20 0 50.041g 0.021t 36788 S 2.6 8.6 423:10.56 rpc reac
And in order to improve efficiency, i changed the sql to: select count(*) from supplier as s, partsupp as p where s.s_suppkey = p.ps_suppkey;
Then i used 'sar' and 'perf' to capture something, and found that 'page-faults' of kudu is almost twice of parquet though i don't know whether it is valuable.
kudu(sar) =================================================>>>
root@hadoop958:~# sar -B 1 1000
Linux 3.16.0-4-amd64 (hadoop958.hz.163.org) 01/08/2018 _x86_64_ (40 CPU)
05:21:21 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
05:21:22 PM 0.00 0.00 25.00 0.00 50.00 0.00 0.00 0.00 0.00
05:21:23 PM 0.00 0.00 24.00 0.00 49.00 0.00 0.00 0.00 0.00
05:21:24 PM 0.00 0.00 2385.00 0.00 516.00 0.00 0.00 0.00 0.00
05:21:25 PM 0.00 0.00 29041.00 0.00 20017.00 0.00 0.00 0.00 0.00
05:21:26 PM 0.00 0.00 40957.00 0.00 14432.00 0.00 0.00 0.00 0.00
05:21:27 PM 0.00 32.00 45776.00 0.00 1433.00 0.00 0.00 0.00 0.00
05:21:28 PM 0.00 0.00 12949.00 0.00 20950.00 0.00 0.00 0.00 0.00
05:21:29 PM 0.00 0.00 11335.00 0.00 22574.00 0.00 0.00 0.00 0.00
05:21:30 PM 0.00 100.00 15852.00 0.00 27227.00 0.00 0.00 0.00 0.00
05:21:31 PM 0.00 0.00 17678.00 0.00 29853.00 0.00 0.00 0.00 0.00
05:21:32 PM 0.00 0.00 12681.00 0.00 20305.00 0.00 0.00 0.00 0.00
05:21:33 PM 0.00 0.00 11062.00 0.00 20368.00 0.00 0.00 0.00 0.00
05:21:34 PM 0.00 0.00 11891.00 0.00 20317.00 0.00 0.00 0.00 0.00
05:21:35 PM 0.00 12.00 12495.00 0.00 20329.00 0.00 0.00 0.00 0.00
05:21:36 PM 0.00 68.00 10453.00 0.00 20320.00 0.00 0.00 0.00 0.00
05:21:37 PM 0.00 0.00 10313.00 0.00 20309.00 0.00 0.00 0.00 0.00
05:21:38 PM 0.00 0.00 10738.00 0.00 20350.00 0.00 0.00 0.00 0.00
05:21:39 PM 0.00 0.00 12216.00 0.00 20398.00 0.00 0.00 0.00 0.00
05:21:40 PM 0.00 0.00 12834.00 0.00 21116.00 0.00 0.00 0.00 0.00
05:21:41 PM 0.00 4.00 12497.00 0.00 20319.00 0.00 0.00 0.00 0.00
05:21:42 PM 0.00 16.00 11831.00 0.00 20376.00 0.00 0.00 0.00 0.00
05:21:43 PM 0.00 0.00 14087.00 0.00 24922.00 0.00 0.00 0.00 0.00
05:21:44 PM 0.00 0.00 17885.00 0.00 24956.00 0.00 0.00 0.00 0.00
05:21:45 PM 0.00 28.00 19622.00 0.00 24578.00 0.00 0.00 0.00 0.00
05:21:46 PM 0.00 4.00 11262.00 0.00 20351.00 0.00 0.00 0.00 0.00
05:21:47 PM 0.00 0.00 13248.00 0.00 21835.00 0.00 0.00 0.00 0.00
05:21:48 PM 0.00 24.00 12915.00 0.00 20322.00 0.00 0.00 0.00 0.00
05:21:49 PM 0.00 0.00 13170.00 0.00 21733.00 0.00 0.00 0.00 0.00
05:21:50 PM 0.00 20.00 13614.00 0.00 20660.00 0.00 0.00 0.00 0.00
05:21:51 PM 0.00 8.00 20138.00 0.00 20972.00 0.00 0.00 0.00 0.00
05:21:52 PM 0.00 0.00 13014.00 0.00 20372.00 0.00 0.00 0.00 0.00
05:21:53 PM 0.00 0.00 11716.00 0.00 20336.00 0.00 0.00 0.00 0.00
05:21:54 PM 0.00 12.00 10320.00 0.00 20442.00 0.00 0.00 0.00 0.00
05:21:54 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
05:21:55 PM 0.00 0.00 19144.00 0.00 29316.00 0.00 0.00 0.00 0.00
05:21:56 PM 0.00 96.00 10316.00 0.00 20297.00 0.00 0.00 0.00 0.00
05:21:57 PM 0.00 0.00 10310.00 0.00 20395.00 0.00 0.00 0.00 0.00
05:21:58 PM 0.00 0.00 10583.00 0.00 20864.00 0.00 0.00 0.00 0.00
05:21:59 PM 0.00 0.00 11559.00 0.00 21296.00 0.00 0.00 0.00 0.00
05:22:00 PM 0.00 0.00 10327.00 0.00 20313.00 0.00 0.00 0.00 0.00
05:22:01 PM 0.00 132.00 10406.00 0.00 29654.00 0.00 0.00 0.00 0.00
05:22:02 PM 0.00 12.00 11089.00 0.00 21865.00 0.00 0.00 0.00 0.00
05:22:03 PM 0.00 0.00 11085.00 0.00 21675.00 0.00 0.00 0.00 0.00
05:22:04 PM 0.00 0.00 11332.00 0.00 22583.00 0.00 0.00 0.00 0.00
05:22:05 PM 0.00 0.00 10820.00 0.00 21572.00 0.00 0.00 0.00 0.00
05:22:06 PM 0.00 40.00 15241.00 0.00 28091.00 0.00 0.00 0.00 0.00
05:22:07 PM 0.00 4.00 12533.00 0.00 24664.00 0.00 0.00 0.00 0.00
05:22:08 PM 0.00 0.00 12149.00 0.00 23097.00 0.00 0.00 0.00 0.00
05:22:09 PM 0.00 0.00 11511.00 0.00 21385.00 0.00 0.00 0.00 0.00
05:22:10 PM 0.00 0.00 10339.00 0.00 20445.00 0.00 0.00 0.00 0.00
05:22:11 PM 0.00 16.00 10306.00 0.00 20319.00 0.00 0.00 0.00 0.00
05:22:12 PM 0.00 16.00 10306.00 0.00 20382.00 0.00 0.00 0.00 0.00
05:22:13 PM 0.00 0.00 10312.00 0.00 20299.00 0.00 0.00 0.00 0.00
05:22:14 PM 0.00 0.00 10700.00 0.00 20363.00 0.00 0.00 0.00 0.00
05:22:15 PM 0.00 0.00 8725.00 0.00 18867.00 0.00 0.00 0.00 0.00
05:22:16 PM 0.00 20.00 55.00 0.00 96883.00 0.00 0.00 0.00 0.00
05:22:17 PM 0.00 0.00 24.00 0.00 53.00 0.00 0.00 0.00 0.00
05:22:18 PM 0.00 0.00 24.00 0.00 107.00 0.00 0.00 0.00 0.00
05:22:19 PM 0.00 0.00 1287.00 0.00 1136.00 0.00 0.00 0.00 0.00
05:22:20 PM 0.00 0.00 31.00 0.00 66.00 0.00 0.00 0.00 0.00
05:22:21 PM 0.00 64.00 24.00 0.00 58.00 0.00 0.00 0.00 0.00
05:22:22 PM 0.00 0.00 24.00 0.00 88.00 0.00 0.00 0.00 0.00
05:22:23 PM 0.00 0.00 24.00 0.00 89.00 0.00 0.00 0.00 0.00
05:22:24 PM 0.00 0.00 24.00 0.00 49.00 0.00 0.00 0.00 0.00
05:22:25 PM 0.00 0.00 24.00 0.00 49.00 0.00 0.00 0.00 0.00
05:22:26 PM 0.00 0.00 24.00 0.00 93.00 0.00 0.00 0.00 0.00
05:22:27 PM 0.00 4.00 24.00 0.00 97.00 0.00 0.00 0.00 0.00
parquet(sar) =============================================>>>
root@hadoop958:~# sar -B 1 1000
Linux 3.16.0-4-amd64 (hadoop958.hz.163.org) 01/08/2018 _x86_64_ (40 CPU)
04:37:37 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
04:37:38 PM 0.00 16.00 25.00 0.00 57.00 0.00 0.00 0.00 0.00
04:37:39 PM 0.00 0.00 14975.00 0.00 2585.00 0.00 0.00 0.00 0.00
04:37:40 PM 0.00 0.00 29143.00 0.00 6272.00 0.00 0.00 0.00 0.00
04:37:41 PM 0.00 0.00 67780.00 0.00 7579.00 0.00 0.00 0.00 0.00
04:37:42 PM 0.00 8.00 3734.00 0.00 4083.00 0.00 0.00 0.00 0.00
04:37:43 PM 0.00 60.00 7019.00 0.00 8188.00 0.00 0.00 0.00 0.00
04:37:44 PM 0.00 24.00 11843.00 0.00 8926.00 0.00 0.00 0.00 0.00
04:37:45 PM 0.00 0.00 9809.00 0.00 8289.00 0.00 0.00 0.00 0.00
04:37:46 PM 0.00 0.00 3792.00 0.00 4031.00 0.00 0.00 0.00 0.00
04:37:47 PM 0.00 0.00 3809.00 0.00 4057.00 0.00 0.00 0.00 0.00
04:37:48 PM 0.00 24.00 3796.00 0.00 4065.00 0.00 0.00 0.00 0.00
04:37:49 PM 0.00 0.00 6021.00 0.00 5349.00 0.00 0.00 0.00 0.00
04:37:50 PM 0.00 0.00 3762.00 0.00 4017.00 0.00 0.00 0.00 0.00
04:37:51 PM 0.00 0.00 13333.00 0.00 4450.00 0.00 0.00 0.00 0.00
04:37:52 PM 0.00 8.00 3839.00 0.00 4095.00 0.00 0.00 0.00 0.00
04:37:53 PM 0.00 0.00 3873.00 0.00 4134.00 0.00 0.00 0.00 0.00
04:37:54 PM 0.00 28.00 3763.00 0.00 3992.00 0.00 0.00 0.00 0.00
04:37:55 PM 0.00 0.00 12371.00 0.00 12656.00 0.00 0.00 0.00 0.00
04:37:56 PM 0.00 0.00 3731.00 0.00 60638.00 0.00 0.00 0.00 0.00
04:37:57 PM 0.00 104.00 3866.00 0.00 3835.00 0.00 0.00 0.00 0.00
04:37:58 PM 0.00 0.00 4116.00 0.00 4394.00 0.00 0.00 0.00 0.00
04:37:59 PM 0.00 0.00 7188.00 0.00 6622.00 0.00 0.00 0.00 0.00
04:38:00 PM 0.00 0.00 4940.00 0.00 10193.00 0.00 0.00 0.00 0.00
04:38:01 PM 0.00 60.00 133.00 0.00 9299.00 0.00 0.00 0.00 0.00
04:38:02 PM 0.00 44.00 49.00 0.00 299.00 0.00 0.00 0.00 0.00
04:38:03 PM 0.00 0.00 160.00 0.00 83.00 0.00 0.00 0.00 0.00
04:38:04 PM 0.00 0.00 24.00 0.00 64.00 0.00 0.00 0.00 0.00
04:38:05 PM 0.00 0.00 24.00 0.00 65.00 0.00 0.00 0.00 0.00
04:38:06 PM 0.00 0.00 746.00 0.00 470.00 0.00 0.00 0.00 0.00
04:38:07 PM 0.00 0.00 81.00 0.00 93162.00 0.00 0.00 0.00 0.00
04:38:08 PM 0.00 16.00 34.00 0.00 981.00 0.00 0.00 0.00 0.00
04:38:09 PM 0.00 0.00 680.00 0.00 545.00 0.00 0.00 0.00 0.00
04:38:10 PM 0.00 0.00 66.00 0.00 118.00 0.00 0.00 0.00 0.00
kudu(perf)===============================================>>>
root@hadoop958:~# perf stat -p 29160 -d
^C
Performance counter stats for process id '29160':
75367.439887 task-clock (msec) # 1.365 CPUs utilized
323,646 context-switches # 0.004 M/sec
8,683 cpu-migrations # 0.115 K/sec
632,653 page-faults # 0.008 M/sec
162,296,209,992 cycles # 2.153 GHz [93.18%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
86,693,164,726 instructions # 0.53 insns per cycle [95.31%]
15,026,610,761 branches # 199.378 M/sec [95.44%]
29,680,120 branch-misses # 0.20% of all branches [95.54%]
33,528,975,631 L1-dcache-loads # 444.873 M/sec [95.91%]
2,484,449,334 L1-dcache-load-misses # 7.41% of all L1-dcache hits [11.40%]
1,465,862,774 LLC-loads # 19.450 M/sec [10.11%]
<not supported> LLC-load-misses:HG
55.217146805 seconds time elapsed
parquet(perf) =============================================>>>
root@hadoop958:~# perf stat -p 29160 -d
^C
Performance counter stats for process id '29160':
51130.516140 task-clock (msec) # 1.557 CPUs utilized
316,994 context-switches # 0.006 M/sec
8,232 cpu-migrations # 0.161 K/sec
190,272 page-faults # 0.004 M/sec
102,279,158,848 cycles # 2.000 GHz [92.86%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
94,061,719,561 instructions # 0.92 insns per cycle [95.23%]
16,498,918,033 branches # 322.682 M/sec [95.15%]
30,073,148 branch-misses # 0.18% of all branches [95.15%]
29,651,847,951 L1-dcache-loads # 579.925 M/sec [95.43%]
2,101,137,839 L1-dcache-load-misses # 7.09% of all L1-dcache hits [12.35%]
1,194,026,289 LLC-loads # 23.353 M/sec [10.25%]
<not supported> LLC-load-misses:HG
32.849196160 seconds time elapsed
On 2018-01-04 00:20, Tim Armstrong <ta...@cloudera.com> wrote:
> That's a tricky one. I have a couple of ideas but it's a bit difficult to
> confirm since the profile isn't really designed to easily answer questions
> like this. ProbeTime measures wall-clock time rather than actual time spent
> executing on the CPU.
>
> My first guess is that it's because the Kudu scan is using more CPU than
> the Parquet scan and ProbeTime so the thread doing the scan is competing
> for resources more with the join thread (either hyperthreads competing for
> resources on the same CPU or threads competing for time on logical
> processors). You could compare the User CPU time for the fragment instances
> containing the joins and scans to see if there is also a discrepancy in CPU
> time. That won't answer this directly but might provide some clues.
>
> My second guess is that there's some kind of subtle memory locality or
> scheduling effect.
>
> On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
>
> > Hi everybody,
> >
> >
> >
> > Recently I ran a simple PHJ onâparquetâ and âkuduâwith this sql
> > independently:
> >
> > select count(*) from lineitem as l, orders as o where l.l_orderkey =
> > o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >=
> > '1995-01-01';
> >
> >
> >
> > And I found that the âProbeTimeâ on âkuduâis much larger than on
> > âparquetâ!!
> >
> >
> >
> > Below are the plans and profiles:
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> > ä½æ夫
> >
> > 2017-04-10 16:06:24
> >
> >
> >
>