You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zhang, Liyun" <li...@intel.com> on 2017/10/12 08:39:37 UTC

How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) or CPU_TIME(ms) should be used?
[cid:image001.png@01D34378.B0FDB5B0]


Appreciate to get some feedback from you!


Best Regards
Kelly Zhang/Zhang,Liyun


Re: How to measure the execution time of query on Hive on Tez

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.
Hi

There are couple of tables that you are seeing, the top one is Query Execution Summary and next one is Task Execution Summary.

At the end of your query execution, you should see something like "Time taken: xx.xx seconds” in hive CLI or beeline. This represents the overall time the query took from start (start of semantic analysis) to finish (end of fetching results).

This overall time is broken down in the Query Execution Summary table. When you submit the query, query goes through multiple phases before returning the results. 
The time for each of these phases is defined in Query Execution Summary tables. 

If you want the query runtime then you should look at the last line in hive CLI or beeline which represents the end-to-end time.
If you are just interested in how long the query actually executed in the cluster then you should look at Run DAG time.

Run DAG is NOT sum of duration of all vertices as vertices can be scheduled in parallel. 
Run DAG is the time that Hive computes when it knows that DAG has been accepted and started by Application Master until the completion of DAG (could be failed, errored or succeeded).

Hope this helps.

Thanks
Prasanth

> On Oct 12, 2017, at 2:46 PM, Zhang, Liyun <li...@intel.com> wrote:
> 
> Hi all:
>   Maybe in last mail the attached picture is not shown.
> I re-described my question here.  I saw following statistics about the runtime when running query.
> 
> The Run DAG is 318s.  But it is not the sum of DURATION of all VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
> Not the sum of CPU_TIME.   There are several indicator "RUN DAG","DURATION","CPU_TIME",  which indicator I should use when measure the performance? Sometimes I found there is significant improvement in sum of (CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this normal?  Appreciate to get some feedback from you!
> 
> 
> 
> 2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION                            DURATION
> 2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query                           3.72s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan                            0.60s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan                             0.61s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG                               0.52s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG                               318.54s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,263  INFO [main] SessionState:
> 2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
> 2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
> 2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
> 2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,298  INFO [main] SessionState:      Map 1          59549.00      1,355,520         28,565     550,076,554    1,602,119,842
> 2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 12           4069.00         15,670            522          73,049              732
> 2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 13           3055.00         14,030            567             212              212
> 2017-10-12T16:29:39,301  INFO [main] SessionState:     Map 14           3055.00         13,820            606             212              212
> 2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10           1004.00         13,450            265               4                4
> 2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11           1006.00          4,290             71             216              212
> 2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2         132736.00      2,362,160         83,029     537,120,745      107,740,258
> 2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3          34248.00        643,350         20,661     107,740,470              203
> 2017-10-12T16:29:39,310  INFO [main] SessionState:  Reducer 4          11077.00         77,020          1,496             203               31
> 2017-10-12T16:29:39,311  INFO [main] SessionState:  Reducer 5           1003.00         40,030            824              10               10
> 2017-10-12T16:29:39,312  INFO [main] SessionState:  Reducer 6            439.00            590              0              10                0
> 2017-10-12T16:29:39,314  INFO [main] SessionState:  Reducer 7         140896.00      1,925,760         52,784     537,120,745      107,740,258
> 2017-10-12T16:29:39,316  INFO [main] SessionState:  Reducer 8          35260.00        590,200         22,331     107,740,470               76
> 2017-10-12T16:29:39,318  INFO [main] SessionState:  Reducer 9           8070.00         24,630            249              76                4
> 2017-10-12T16:29:39,318  INFO [main] SessionState: -------------------------------------------------------------------
> 
> From: Zhang, Liyun [mailto:liyun.zhang@intel.com]
> Sent: Thursday, October 12, 2017 4:40 PM
> To: dev@hive.apache.org
> Subject: How to measure the execution time of query on Hive on Tez
> 
> Hi  all:
>  Anyone knows how to view the detail execution time of every map/reduce task in hive on tez?
> I screenshot the result:
> Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) or CPU_TIME(ms) should be used?
> [cid:image001.png@01D34378.B0FDB5B0]
> 
> Appreciate to get some feedback from you!
> 
> 
> Best Regards
> Kelly Zhang/Zhang,Liyun
> 


Re: How to measure the execution time of query on Hive on Tez

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.
Hi

There are couple of tables that you are seeing, the top one is Query Execution Summary and next one is Task Execution Summary.

At the end of your query execution, you should see something like "Time taken: xx.xx seconds” in hive CLI or beeline. This represents the overall time the query took from start (start of semantic analysis) to finish (end of fetching results).

This overall time is broken down in the Query Execution Summary table. When you submit the query, query goes through multiple phases before returning the results. 
The time for each of these phases is defined in Query Execution Summary tables. 

If you want the query runtime then you should look at the last line in hive CLI or beeline which represents the end-to-end time.
If you are just interested in how long the query actually executed in the cluster then you should look at Run DAG time.

Run DAG is NOT sum of duration of all vertices as vertices can be scheduled in parallel. 
Run DAG is the time that Hive computes when it knows that DAG has been accepted and started by Application Master until the completion of DAG (could be failed, errored or succeeded).

Hope this helps.

Thanks
Prasanth

> On Oct 12, 2017, at 2:46 PM, Zhang, Liyun <li...@intel.com> wrote:
> 
> Hi all:
>   Maybe in last mail the attached picture is not shown.
> I re-described my question here.  I saw following statistics about the runtime when running query.
> 
> The Run DAG is 318s.  But it is not the sum of DURATION of all VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
> Not the sum of CPU_TIME.   There are several indicator "RUN DAG","DURATION","CPU_TIME",  which indicator I should use when measure the performance? Sometimes I found there is significant improvement in sum of (CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this normal?  Appreciate to get some feedback from you!
> 
> 
> 
> 2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION                            DURATION
> 2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query                           3.72s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan                            0.60s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan                             0.61s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG                               0.52s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG                               318.54s
> 2017-10-12T16:29:39,263  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,263  INFO [main] SessionState:
> 2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
> 2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
> 2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
> 2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
> 2017-10-12T16:29:39,298  INFO [main] SessionState:      Map 1          59549.00      1,355,520         28,565     550,076,554    1,602,119,842
> 2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 12           4069.00         15,670            522          73,049              732
> 2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 13           3055.00         14,030            567             212              212
> 2017-10-12T16:29:39,301  INFO [main] SessionState:     Map 14           3055.00         13,820            606             212              212
> 2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10           1004.00         13,450            265               4                4
> 2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11           1006.00          4,290             71             216              212
> 2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2         132736.00      2,362,160         83,029     537,120,745      107,740,258
> 2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3          34248.00        643,350         20,661     107,740,470              203
> 2017-10-12T16:29:39,310  INFO [main] SessionState:  Reducer 4          11077.00         77,020          1,496             203               31
> 2017-10-12T16:29:39,311  INFO [main] SessionState:  Reducer 5           1003.00         40,030            824              10               10
> 2017-10-12T16:29:39,312  INFO [main] SessionState:  Reducer 6            439.00            590              0              10                0
> 2017-10-12T16:29:39,314  INFO [main] SessionState:  Reducer 7         140896.00      1,925,760         52,784     537,120,745      107,740,258
> 2017-10-12T16:29:39,316  INFO [main] SessionState:  Reducer 8          35260.00        590,200         22,331     107,740,470               76
> 2017-10-12T16:29:39,318  INFO [main] SessionState:  Reducer 9           8070.00         24,630            249              76                4
> 2017-10-12T16:29:39,318  INFO [main] SessionState: -------------------------------------------------------------------
> 
> From: Zhang, Liyun [mailto:liyun.zhang@intel.com]
> Sent: Thursday, October 12, 2017 4:40 PM
> To: dev@hive.apache.org
> Subject: How to measure the execution time of query on Hive on Tez
> 
> Hi  all:
>  Anyone knows how to view the detail execution time of every map/reduce task in hive on tez?
> I screenshot the result:
> Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) or CPU_TIME(ms) should be used?
> [cid:image001.png@01D34378.B0FDB5B0]
> 
> Appreciate to get some feedback from you!
> 
> 
> Best Regards
> Kelly Zhang/Zhang,Liyun
> 


RE: How to measure the execution time of query on Hive on Tez

Posted by "Zhang, Liyun" <li...@intel.com>.
Hi all:
   Maybe in last mail the attached picture is not shown.
I re-described my question here.  I saw following statistics about the runtime when running query.

The Run DAG is 318s.  But it is not the sum of DURATION of all VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
Not the sum of CPU_TIME.   There are several indicator "RUN DAG","DURATION","CPU_TIME",  which indicator I should use when measure the performance? Sometimes I found there is significant improvement in sum of (CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this normal?  Appreciate to get some feedback from you!



2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION                            DURATION
2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query                           3.72s
2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan                            0.60s
2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan                             0.61s
2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG                               0.52s
2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG                               318.54s
2017-10-12T16:29:39,263  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,263  INFO [main] SessionState:
2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,298  INFO [main] SessionState:      Map 1          59549.00      1,355,520         28,565     550,076,554    1,602,119,842
2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 12           4069.00         15,670            522          73,049              732
2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 13           3055.00         14,030            567             212              212
2017-10-12T16:29:39,301  INFO [main] SessionState:     Map 14           3055.00         13,820            606             212              212
2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10           1004.00         13,450            265               4                4
2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11           1006.00          4,290             71             216              212
2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2         132736.00      2,362,160         83,029     537,120,745      107,740,258
2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3          34248.00        643,350         20,661     107,740,470              203
2017-10-12T16:29:39,310  INFO [main] SessionState:  Reducer 4          11077.00         77,020          1,496             203               31
2017-10-12T16:29:39,311  INFO [main] SessionState:  Reducer 5           1003.00         40,030            824              10               10
2017-10-12T16:29:39,312  INFO [main] SessionState:  Reducer 6            439.00            590              0              10                0
2017-10-12T16:29:39,314  INFO [main] SessionState:  Reducer 7         140896.00      1,925,760         52,784     537,120,745      107,740,258
2017-10-12T16:29:39,316  INFO [main] SessionState:  Reducer 8          35260.00        590,200         22,331     107,740,470               76
2017-10-12T16:29:39,318  INFO [main] SessionState:  Reducer 9           8070.00         24,630            249              76                4
2017-10-12T16:29:39,318  INFO [main] SessionState: -------------------------------------------------------------------

From: Zhang, Liyun [mailto:liyun.zhang@intel.com]
Sent: Thursday, October 12, 2017 4:40 PM
To: dev@hive.apache.org
Subject: How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) or CPU_TIME(ms) should be used?
[cid:image001.png@01D34378.B0FDB5B0]

Appreciate to get some feedback from you!


Best Regards
Kelly Zhang/Zhang,Liyun


RE: How to measure the execution time of query on Hive on Tez

Posted by "Zhang, Liyun" <li...@intel.com>.
Hi all:
   Maybe in last mail the attached picture is not shown.
I re-described my question here.  I saw following statistics about the runtime when running query.

The Run DAG is 318s.  But it is not the sum of DURATION of all VERTICES((59549+4069+3055+3055+1004+1006+132736+34248+11077+1003+439+140896+35260+8070)/1000=435s
Not the sum of CPU_TIME.   There are several indicator "RUN DAG","DURATION","CPU_TIME",  which indicator I should use when measure the performance? Sometimes I found there is significant improvement in sum of (CPU_TIME) while there is no significant improvement in "RUN DAG".  Is this normal?  Appreciate to get some feedback from you!



2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,262  INFO [main] SessionState: OPERATION                            DURATION
2017-10-12T16:29:39,262  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,263  INFO [main] SessionState: Compile Query                           3.72s
2017-10-12T16:29:39,263  INFO [main] SessionState: Prepare Plan                            0.60s
2017-10-12T16:29:39,263  INFO [main] SessionState: Submit Plan                             0.61s
2017-10-12T16:29:39,263  INFO [main] SessionState: Start DAG                               0.52s
2017-10-12T16:29:39,263  INFO [main] SessionState: Run DAG                               318.54s
2017-10-12T16:29:39,263  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,263  INFO [main] SessionState:
2017-10-12T16:29:39,289  INFO [cea2258c-aa47-46a1-af5b-39860a6edbb3 main] counters.Limits: Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=2000
2017-10-12T16:29:39,294  INFO [main] SessionState: Task Execution Summary
2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,294  INFO [main] SessionState:   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
2017-10-12T16:29:39,294  INFO [main] SessionState: ----------------------------------------------------------------------------------------------
2017-10-12T16:29:39,298  INFO [main] SessionState:      Map 1          59549.00      1,355,520         28,565     550,076,554    1,602,119,842
2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 12           4069.00         15,670            522          73,049              732
2017-10-12T16:29:39,300  INFO [main] SessionState:     Map 13           3055.00         14,030            567             212              212
2017-10-12T16:29:39,301  INFO [main] SessionState:     Map 14           3055.00         13,820            606             212              212
2017-10-12T16:29:39,303  INFO [main] SessionState: Reducer 10           1004.00         13,450            265               4                4
2017-10-12T16:29:39,305  INFO [main] SessionState: Reducer 11           1006.00          4,290             71             216              212
2017-10-12T16:29:39,307  INFO [main] SessionState:  Reducer 2         132736.00      2,362,160         83,029     537,120,745      107,740,258
2017-10-12T16:29:39,308  INFO [main] SessionState:  Reducer 3          34248.00        643,350         20,661     107,740,470              203
2017-10-12T16:29:39,310  INFO [main] SessionState:  Reducer 4          11077.00         77,020          1,496             203               31
2017-10-12T16:29:39,311  INFO [main] SessionState:  Reducer 5           1003.00         40,030            824              10               10
2017-10-12T16:29:39,312  INFO [main] SessionState:  Reducer 6            439.00            590              0              10                0
2017-10-12T16:29:39,314  INFO [main] SessionState:  Reducer 7         140896.00      1,925,760         52,784     537,120,745      107,740,258
2017-10-12T16:29:39,316  INFO [main] SessionState:  Reducer 8          35260.00        590,200         22,331     107,740,470               76
2017-10-12T16:29:39,318  INFO [main] SessionState:  Reducer 9           8070.00         24,630            249              76                4
2017-10-12T16:29:39,318  INFO [main] SessionState: -------------------------------------------------------------------

From: Zhang, Liyun [mailto:liyun.zhang@intel.com]
Sent: Thursday, October 12, 2017 4:40 PM
To: dev@hive.apache.org
Subject: How to measure the execution time of query on Hive on Tez

Hi  all:
  Anyone knows how to view the detail execution time of every map/reduce task in hive on tez?
I screenshot the result:
Run DAG is  324.s . But this is not the sum of DURATION time of every tasks( 665=163+22+1+1+2+3+143+31+4+0+254+29+8+2+1+1).  So which parameter DURATION(ms) or CPU_TIME(ms) should be used?
[cid:image001.png@01D34378.B0FDB5B0]

Appreciate to get some feedback from you!


Best Regards
Kelly Zhang/Zhang,Liyun