You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Tim Armstrong <ta...@cloudera.com> on 2018/01/03 16:20:39 UTC

Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?

That's a tricky one. I have a couple of ideas but it's a bit difficult to
confirm since the profile isn't really designed to easily answer questions
like this. ProbeTime measures wall-clock time rather than actual time spent
executing on the CPU.

My first guess is that it's because the Kudu scan is using more CPU than
the Parquet scan and ProbeTime so the thread doing the scan is competing
for resources more with the join thread (either hyperthreads competing for
resources on the same CPU or threads competing for time on logical
processors). You could compare the User CPU time for the fragment instances
containing the joins and scans to see if there is also a discrepancy in CPU
time. That won't answer this directly but might provide some clues.

My second guess is that there's some kind of subtle memory locality or
scheduling effect.

On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:

> Hi everybody,
>
>
>
> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql
> independently:
>
> select count(*) from lineitem as l, orders as o where l.l_orderkey =
> o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >=
> '1995-01-01';
>
>
>
> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on
> ‘parquet’!!
>
>
>
> Below are the plans and profiles:
>
>
>
> Thanks in advance.
>
>
>
>
>
> 何李夫
>
> 2017-04-10 16:06:24
>
>
>

Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?

Posted by Mostafa Mokhtar <mm...@cloudera.com>.
Can you please attach the full query profiles? 

Thanks 
Mostafa

> On Jan 3, 2018, at 8:20 AM, Tim Armstrong <ta...@cloudera.com> wrote:
> 
> That's a tricky one. I have a couple of ideas but it's a bit difficult to confirm since the profile isn't really designed to easily answer questions like this. ProbeTime measures wall-clock time rather than actual time spent executing on the CPU.
> 
> My first guess is that it's because the Kudu scan is using more CPU than the Parquet scan and ProbeTime so the thread doing the scan is competing for resources more with the join thread (either hyperthreads competing for resources on the same CPU or threads competing for time on logical processors). You could compare the User CPU time for the fragment instances containing the joins and scans to see if there is also a discrepancy in CPU time. That won't answer this directly but might provide some clues.
> 
> My second guess is that there's some kind of subtle memory locality or scheduling effect.
> 
>> On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
>> Hi everybody,
>> 
>>  
>> 
>> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql independently:
>> 
>> select count(*) from lineitem as l, orders as o where l.l_orderkey = o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >= '1995-01-01';
>> 
>>  
>> 
>> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on ‘parquet’!!
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>> Below are the plans and profiles:
>> 
>> 
>> 
>>  
>> 
>> Thanks in advance.
>> 
>>  
>> 
>>  
>> 
>> 何李夫
>> 
>> 2017-04-10 16:06:24
>> 
>>  
>> 
> 

Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?

Posted by Mostafa Mokhtar <mm...@cloudera.com>.
Can you please attach the full query profiles? 

Thanks 
Mostafa

> On Jan 3, 2018, at 8:20 AM, Tim Armstrong <ta...@cloudera.com> wrote:
> 
> That's a tricky one. I have a couple of ideas but it's a bit difficult to confirm since the profile isn't really designed to easily answer questions like this. ProbeTime measures wall-clock time rather than actual time spent executing on the CPU.
> 
> My first guess is that it's because the Kudu scan is using more CPU than the Parquet scan and ProbeTime so the thread doing the scan is competing for resources more with the join thread (either hyperthreads competing for resources on the same CPU or threads competing for time on logical processors). You could compare the User CPU time for the fragment instances containing the joins and scans to see if there is also a discrepancy in CPU time. That won't answer this directly but might provide some clues.
> 
> My second guess is that there's some kind of subtle memory locality or scheduling effect.
> 
>> On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
>> Hi everybody,
>> 
>>  
>> 
>> Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql independently:
>> 
>> select count(*) from lineitem as l, orders as o where l.l_orderkey = o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >= '1995-01-01';
>> 
>>  
>> 
>> And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on ‘parquet’!!
>> 
>> 
>> 
>> 
>> 
>>  
>> 
>> Below are the plans and profiles:
>> 
>> 
>> 
>>  
>> 
>> Thanks in advance.
>> 
>>  
>> 
>>  
>> 
>> 何李夫
>> 
>> 2017-04-10 16:06:24
>> 
>>  
>> 
> 

Re: Could anybody help to explain why there is such a big gap in 'ProbeTime' and How to fix this gap?

Posted by "helifu@gmail.com" <he...@gmail.com>.
The cpu and memory are sufficient. Only one core is 100% during PHJ:
----->top + H
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
     146455 impala    20   0 34.534g 2.130g  80992 R 99.9  0.8   0:05.94 impalad
     51161 kudu      20   0 50.041g 0.021t  36788 S  4.9  8.6 672:05.66 rpc reactor-511
     51177 kudu      20   0 50.041g 0.021t  36788 S  4.9  8.6   0:37.49 rpc worker-5117
     51163 kudu      20   0 50.041g 0.021t  36788 S  4.6  8.6 417:25.94 rpc reactor-511
     51185 kudu      20   0 50.041g 0.021t  36788 S  3.9  8.6   0:03.43 rpc worker-5118
     51337 kudu      20   0 50.041g 0.021t  36788 S  3.9  8.6 417:15.66 maintenance_sch
     114193 impala    20   0 34.534g 2.130g  80992 S  3.9  0.8   0:03.96 rpc reactor-114
     51191 kudu      20   0 50.041g 0.021t  36788 S  3.6  8.6   0:20.62 rpc worker-5119
     51190 kudu      20   0 50.041g 0.021t  36788 S  2.9  8.6   0:05.81 rpc worker-5119
     51162 kudu      20   0 50.041g 0.021t  36788 S  2.6  8.6 423:10.56 rpc reac

And in order to improve efficiency, i changed the sql to: select count(*) from supplier as s, partsupp as p where s.s_suppkey = p.ps_suppkey;
Then i used 'sar' and 'perf' to capture something, and found that 'page-faults' of kudu is almost twice of parquet though i don't know whether it is valuable.

kudu(sar) =================================================>>>
root@hadoop958:~# sar -B 1 1000
Linux 3.16.0-4-amd64 (hadoop958.hz.163.org) 	01/08/2018 	_x86_64_	(40 CPU)

05:21:21 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
05:21:22 PM      0.00      0.00     25.00      0.00     50.00      0.00      0.00      0.00      0.00
05:21:23 PM      0.00      0.00     24.00      0.00     49.00      0.00      0.00      0.00      0.00
05:21:24 PM      0.00      0.00   2385.00      0.00    516.00      0.00      0.00      0.00      0.00
05:21:25 PM      0.00      0.00  29041.00      0.00  20017.00      0.00      0.00      0.00      0.00
05:21:26 PM      0.00      0.00  40957.00      0.00  14432.00      0.00      0.00      0.00      0.00
05:21:27 PM      0.00     32.00  45776.00      0.00   1433.00      0.00      0.00      0.00      0.00
05:21:28 PM      0.00      0.00  12949.00      0.00  20950.00      0.00      0.00      0.00      0.00
05:21:29 PM      0.00      0.00  11335.00      0.00  22574.00      0.00      0.00      0.00      0.00
05:21:30 PM      0.00    100.00  15852.00      0.00  27227.00      0.00      0.00      0.00      0.00
05:21:31 PM      0.00      0.00  17678.00      0.00  29853.00      0.00      0.00      0.00      0.00
05:21:32 PM      0.00      0.00  12681.00      0.00  20305.00      0.00      0.00      0.00      0.00
05:21:33 PM      0.00      0.00  11062.00      0.00  20368.00      0.00      0.00      0.00      0.00
05:21:34 PM      0.00      0.00  11891.00      0.00  20317.00      0.00      0.00      0.00      0.00
05:21:35 PM      0.00     12.00  12495.00      0.00  20329.00      0.00      0.00      0.00      0.00
05:21:36 PM      0.00     68.00  10453.00      0.00  20320.00      0.00      0.00      0.00      0.00
05:21:37 PM      0.00      0.00  10313.00      0.00  20309.00      0.00      0.00      0.00      0.00
05:21:38 PM      0.00      0.00  10738.00      0.00  20350.00      0.00      0.00      0.00      0.00
05:21:39 PM      0.00      0.00  12216.00      0.00  20398.00      0.00      0.00      0.00      0.00
05:21:40 PM      0.00      0.00  12834.00      0.00  21116.00      0.00      0.00      0.00      0.00
05:21:41 PM      0.00      4.00  12497.00      0.00  20319.00      0.00      0.00      0.00      0.00
05:21:42 PM      0.00     16.00  11831.00      0.00  20376.00      0.00      0.00      0.00      0.00
05:21:43 PM      0.00      0.00  14087.00      0.00  24922.00      0.00      0.00      0.00      0.00
05:21:44 PM      0.00      0.00  17885.00      0.00  24956.00      0.00      0.00      0.00      0.00
05:21:45 PM      0.00     28.00  19622.00      0.00  24578.00      0.00      0.00      0.00      0.00
05:21:46 PM      0.00      4.00  11262.00      0.00  20351.00      0.00      0.00      0.00      0.00
05:21:47 PM      0.00      0.00  13248.00      0.00  21835.00      0.00      0.00      0.00      0.00
05:21:48 PM      0.00     24.00  12915.00      0.00  20322.00      0.00      0.00      0.00      0.00
05:21:49 PM      0.00      0.00  13170.00      0.00  21733.00      0.00      0.00      0.00      0.00
05:21:50 PM      0.00     20.00  13614.00      0.00  20660.00      0.00      0.00      0.00      0.00
05:21:51 PM      0.00      8.00  20138.00      0.00  20972.00      0.00      0.00      0.00      0.00
05:21:52 PM      0.00      0.00  13014.00      0.00  20372.00      0.00      0.00      0.00      0.00
05:21:53 PM      0.00      0.00  11716.00      0.00  20336.00      0.00      0.00      0.00      0.00
05:21:54 PM      0.00     12.00  10320.00      0.00  20442.00      0.00      0.00      0.00      0.00

05:21:54 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
05:21:55 PM      0.00      0.00  19144.00      0.00  29316.00      0.00      0.00      0.00      0.00
05:21:56 PM      0.00     96.00  10316.00      0.00  20297.00      0.00      0.00      0.00      0.00
05:21:57 PM      0.00      0.00  10310.00      0.00  20395.00      0.00      0.00      0.00      0.00
05:21:58 PM      0.00      0.00  10583.00      0.00  20864.00      0.00      0.00      0.00      0.00
05:21:59 PM      0.00      0.00  11559.00      0.00  21296.00      0.00      0.00      0.00      0.00
05:22:00 PM      0.00      0.00  10327.00      0.00  20313.00      0.00      0.00      0.00      0.00
05:22:01 PM      0.00    132.00  10406.00      0.00  29654.00      0.00      0.00      0.00      0.00
05:22:02 PM      0.00     12.00  11089.00      0.00  21865.00      0.00      0.00      0.00      0.00
05:22:03 PM      0.00      0.00  11085.00      0.00  21675.00      0.00      0.00      0.00      0.00
05:22:04 PM      0.00      0.00  11332.00      0.00  22583.00      0.00      0.00      0.00      0.00
05:22:05 PM      0.00      0.00  10820.00      0.00  21572.00      0.00      0.00      0.00      0.00
05:22:06 PM      0.00     40.00  15241.00      0.00  28091.00      0.00      0.00      0.00      0.00
05:22:07 PM      0.00      4.00  12533.00      0.00  24664.00      0.00      0.00      0.00      0.00
05:22:08 PM      0.00      0.00  12149.00      0.00  23097.00      0.00      0.00      0.00      0.00
05:22:09 PM      0.00      0.00  11511.00      0.00  21385.00      0.00      0.00      0.00      0.00
05:22:10 PM      0.00      0.00  10339.00      0.00  20445.00      0.00      0.00      0.00      0.00
05:22:11 PM      0.00     16.00  10306.00      0.00  20319.00      0.00      0.00      0.00      0.00
05:22:12 PM      0.00     16.00  10306.00      0.00  20382.00      0.00      0.00      0.00      0.00
05:22:13 PM      0.00      0.00  10312.00      0.00  20299.00      0.00      0.00      0.00      0.00
05:22:14 PM      0.00      0.00  10700.00      0.00  20363.00      0.00      0.00      0.00      0.00
05:22:15 PM      0.00      0.00   8725.00      0.00  18867.00      0.00      0.00      0.00      0.00
05:22:16 PM      0.00     20.00     55.00      0.00  96883.00      0.00      0.00      0.00      0.00
05:22:17 PM      0.00      0.00     24.00      0.00     53.00      0.00      0.00      0.00      0.00
05:22:18 PM      0.00      0.00     24.00      0.00    107.00      0.00      0.00      0.00      0.00
05:22:19 PM      0.00      0.00   1287.00      0.00   1136.00      0.00      0.00      0.00      0.00
05:22:20 PM      0.00      0.00     31.00      0.00     66.00      0.00      0.00      0.00      0.00
05:22:21 PM      0.00     64.00     24.00      0.00     58.00      0.00      0.00      0.00      0.00
05:22:22 PM      0.00      0.00     24.00      0.00     88.00      0.00      0.00      0.00      0.00
05:22:23 PM      0.00      0.00     24.00      0.00     89.00      0.00      0.00      0.00      0.00
05:22:24 PM      0.00      0.00     24.00      0.00     49.00      0.00      0.00      0.00      0.00
05:22:25 PM      0.00      0.00     24.00      0.00     49.00      0.00      0.00      0.00      0.00
05:22:26 PM      0.00      0.00     24.00      0.00     93.00      0.00      0.00      0.00      0.00
05:22:27 PM      0.00      4.00     24.00      0.00     97.00      0.00      0.00      0.00      0.00

parquet(sar) =============================================>>>
root@hadoop958:~# sar -B 1 1000
Linux 3.16.0-4-amd64 (hadoop958.hz.163.org) 	01/08/2018 	_x86_64_	(40 CPU)

04:37:37 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
04:37:38 PM      0.00     16.00     25.00      0.00     57.00      0.00      0.00      0.00      0.00
04:37:39 PM      0.00      0.00  14975.00      0.00   2585.00      0.00      0.00      0.00      0.00
04:37:40 PM      0.00      0.00  29143.00      0.00   6272.00      0.00      0.00      0.00      0.00
04:37:41 PM      0.00      0.00  67780.00      0.00   7579.00      0.00      0.00      0.00      0.00
04:37:42 PM      0.00      8.00   3734.00      0.00   4083.00      0.00      0.00      0.00      0.00
04:37:43 PM      0.00     60.00   7019.00      0.00   8188.00      0.00      0.00      0.00      0.00
04:37:44 PM      0.00     24.00  11843.00      0.00   8926.00      0.00      0.00      0.00      0.00
04:37:45 PM      0.00      0.00   9809.00      0.00   8289.00      0.00      0.00      0.00      0.00
04:37:46 PM      0.00      0.00   3792.00      0.00   4031.00      0.00      0.00      0.00      0.00
04:37:47 PM      0.00      0.00   3809.00      0.00   4057.00      0.00      0.00      0.00      0.00
04:37:48 PM      0.00     24.00   3796.00      0.00   4065.00      0.00      0.00      0.00      0.00
04:37:49 PM      0.00      0.00   6021.00      0.00   5349.00      0.00      0.00      0.00      0.00
04:37:50 PM      0.00      0.00   3762.00      0.00   4017.00      0.00      0.00      0.00      0.00
04:37:51 PM      0.00      0.00  13333.00      0.00   4450.00      0.00      0.00      0.00      0.00
04:37:52 PM      0.00      8.00   3839.00      0.00   4095.00      0.00      0.00      0.00      0.00
04:37:53 PM      0.00      0.00   3873.00      0.00   4134.00      0.00      0.00      0.00      0.00
04:37:54 PM      0.00     28.00   3763.00      0.00   3992.00      0.00      0.00      0.00      0.00
04:37:55 PM      0.00      0.00  12371.00      0.00  12656.00      0.00      0.00      0.00      0.00
04:37:56 PM      0.00      0.00   3731.00      0.00  60638.00      0.00      0.00      0.00      0.00
04:37:57 PM      0.00    104.00   3866.00      0.00   3835.00      0.00      0.00      0.00      0.00
04:37:58 PM      0.00      0.00   4116.00      0.00   4394.00      0.00      0.00      0.00      0.00
04:37:59 PM      0.00      0.00   7188.00      0.00   6622.00      0.00      0.00      0.00      0.00
04:38:00 PM      0.00      0.00   4940.00      0.00  10193.00      0.00      0.00      0.00      0.00
04:38:01 PM      0.00     60.00    133.00      0.00   9299.00      0.00      0.00      0.00      0.00
04:38:02 PM      0.00     44.00     49.00      0.00    299.00      0.00      0.00      0.00      0.00
04:38:03 PM      0.00      0.00    160.00      0.00     83.00      0.00      0.00      0.00      0.00
04:38:04 PM      0.00      0.00     24.00      0.00     64.00      0.00      0.00      0.00      0.00
04:38:05 PM      0.00      0.00     24.00      0.00     65.00      0.00      0.00      0.00      0.00
04:38:06 PM      0.00      0.00    746.00      0.00    470.00      0.00      0.00      0.00      0.00
04:38:07 PM      0.00      0.00     81.00      0.00  93162.00      0.00      0.00      0.00      0.00
04:38:08 PM      0.00     16.00     34.00      0.00    981.00      0.00      0.00      0.00      0.00
04:38:09 PM      0.00      0.00    680.00      0.00    545.00      0.00      0.00      0.00      0.00
04:38:10 PM      0.00      0.00     66.00      0.00    118.00      0.00      0.00      0.00      0.00

kudu(perf)===============================================>>>
root@hadoop958:~# perf stat -p 29160 -d
^C
 Performance counter stats for process id '29160':

      75367.439887      task-clock (msec)         #    1.365 CPUs utilized          
           323,646      context-switches          #    0.004 M/sec                  
             8,683      cpu-migrations            #    0.115 K/sec                  
           632,653      page-faults               #    0.008 M/sec                  
   162,296,209,992      cycles                    #    2.153 GHz                     [93.18%]
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    86,693,164,726      instructions              #    0.53  insns per cycle         [95.31%]
    15,026,610,761      branches                  #  199.378 M/sec                   [95.44%]
        29,680,120      branch-misses             #    0.20% of all branches         [95.54%]
    33,528,975,631      L1-dcache-loads           #  444.873 M/sec                   [95.91%]
     2,484,449,334      L1-dcache-load-misses     #    7.41% of all L1-dcache hits   [11.40%]
     1,465,862,774      LLC-loads                 #   19.450 M/sec                   [10.11%]
   <not supported>      LLC-load-misses:HG       

      55.217146805 seconds time elapsed

parquet(perf) =============================================>>>
root@hadoop958:~# perf stat -p 29160 -d
^C
 Performance counter stats for process id '29160':

      51130.516140      task-clock (msec)         #    1.557 CPUs utilized          
           316,994      context-switches          #    0.006 M/sec                  
             8,232      cpu-migrations            #    0.161 K/sec                  
           190,272      page-faults               #    0.004 M/sec                  
   102,279,158,848      cycles                    #    2.000 GHz                     [92.86%]
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    94,061,719,561      instructions              #    0.92  insns per cycle         [95.23%]
    16,498,918,033      branches                  #  322.682 M/sec                   [95.15%]
        30,073,148      branch-misses             #    0.18% of all branches         [95.15%]
    29,651,847,951      L1-dcache-loads           #  579.925 M/sec                   [95.43%]
     2,101,137,839      L1-dcache-load-misses     #    7.09% of all L1-dcache hits   [12.35%]
     1,194,026,289      LLC-loads                 #   23.353 M/sec                   [10.25%]
   <not supported>      LLC-load-misses:HG       

      32.849196160 seconds time elapsed



On 2018-01-04 00:20, Tim Armstrong <ta...@cloudera.com> wrote: 
> That's a tricky one. I have a couple of ideas but it's a bit difficult to
> confirm since the profile isn't really designed to easily answer questions
> like this. ProbeTime measures wall-clock time rather than actual time spent
> executing on the CPU.
> 
> My first guess is that it's because the Kudu scan is using more CPU than
> the Parquet scan and ProbeTime so the thread doing the scan is competing
> for resources more with the join thread (either hyperthreads competing for
> resources on the same CPU or threads competing for time on logical
> processors). You could compare the User CPU time for the fragment instances
> containing the joins and scans to see if there is also a discrepancy in CPU
> time. That won't answer this directly but might provide some clues.
> 
> My second guess is that there's some kind of subtle memory locality or
> scheduling effect.
> 
> On Tue, Jan 2, 2018 at 11:40 PM, helifu <hz...@corp.netease.com> wrote:
> 
> > Hi everybody,
> >
> >
> >
> > Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql
> > independently:
> >
> > select count(*) from lineitem as l, orders as o where l.l_orderkey =
> > o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >=
> > '1995-01-01';
> >
> >
> >
> > And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on
> > ‘parquet’!!
> >
> >
> >
> > Below are the plans and profiles:
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> > 何李夫
> >
> > 2017-04-10 16:06:24
> >
> >
> >
>