You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/04/15 16:58:00 UTC

[jira] [Commented] (IMPALA-9422) Improve join builder profiles

    [ https://issues.apache.org/jira/browse/IMPALA-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084245#comment-17084245 ] 

ASF subversion and git services commented on IMPALA-9422:
---------------------------------------------------------

Commit bd4458b7a92910178a3f92cec888e83172899408 in impala's branch refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bd4458b ]

IMPALA-9422: Re-visit and improve join node and builder's counters

This patch makes the following changes:
- All code executed inside the builder that gets executed exactly once
during query execution is attributed to the builder. This also include
public calls to the builder that are used as synchronization points
for shared builds. The serial execution phase in these methods are
always executed once regardless of the builder execution mode (namely
single-threaded, parallel execution, separate build sink).
- Also makes sure there is no double counting of total time in builder.
- BuildTime counter has been removed from the join node's profile in
favor of the builder's total time.
- BuildRowsPartitioned from the builder is equivalent to BuildRows in
the join node and hence that counter has been moved to the builders.

- Also fixed a bug in RuntimeProfile where 'non-child' and '%non-child'
were not computed after a profile object was created from its thrift
representation.

An example of the new profiles:

HASH_JOIN_NODE (id=2):(Total: 147.531ms, non-child: 2.282ms, % non-child: 0.80%)
  ExecOption: Codegen Disabled: disabled due to optimization hints, Join Build-Side Prepared Asynchronously
  Node Lifecycle Event Timeline: 148.284ms
     - Open Started: 1.511ms (1.511ms)
     - Open Finished: 147.923ms (146.411ms)
     - First Batch Requested: 147.946ms (23.488us)
     - First Batch Returned: 148.137ms (190.470us)
     - Last Batch Returned: 148.207ms (69.869us)
     - Closed: 148.284ms (77.131us)
   - PeakMemoryUsage: 1.98 MB (2074880)
   - ProbeRows: 31 (31)
   - ProbeRowsPartitioned: 0 (0)
   - ProbeTime: 25.579us
   - RowsReturned: 31 (31)
   - RowsReturnedRate: 210.00 /sec
  Buffer pool:
     - AllocTime: 31.986us
     - CompressionTime: 0.000ns
     - CumulativeAllocationBytes: 1.00 MB (1048576)
     - CumulativeAllocations: 16 (16)
     - EncryptionTime: 0.000ns
     - PeakReservation: 1.94 MB (2031616)
     - PeakUnpinnedBytes: 0
     - PeakUsedReservation: 1.00 MB (1048576)
     - ReadIoBytes: 0
     - ReadIoOps: 0 (0)
     - ReadIoWaitTime: 0.000ns
     - SystemAllocTime: 23.944us
     - WriteIoBytes: 0
     - WriteIoOps: 0 (0)
     - WriteIoWaitTime: 0.000ns
  Hash Join Builder (join_node_id=2):(Total: 1.617ms, non-child: 1.617ms, % non-child: 100.00%)
    ExecOption: Codegen Disabled: disabled due to optimization hints
    Runtime filters: 1 of 1 Runtime Filter Published
     - BuildRows: 31 (31)
     - BuildRowsPartitionTime: 153.742us
     - HashTablesBuildTime: 361.306us
     - LargestPartitionPercent: 9 (9)
     - MaxPartitionLevel: 0 (0)
     - NumHashTableBuildsSkipped: 0 (0)
     - NumRepartitions: 0 (0)
     - PartitionsCreated: 16 (16)
     - PeakMemoryUsage: 17.12 KB (17536)
     - RepartitionTime: 0.000ns
     - SpilledPartitions: 0 (0)
    Hash Table:
       - HashBuckets: 48 (48)
       - HashCollisions: 0 (0)
       - Probes: 62 (62)
       - Resizes: 0 (0)
       - Travel: 42 (42)
  EXCHANGE_NODE (id=4):(Total: 138.859ms, non-child: 81.310us, % non-child: 0.06%)

NESTED_LOOP_JOIN_NODE (id=3):(Total: 1s915ms, non-child: 906.707ms, % non-child: 47.34%)
  ExecOption: Join Build-Side Prepared Asynchronously
  Node Lifecycle Event Timeline: 2s190ms
     - Open Started: 254.677ms (254.677ms)
     - Open Finished: 2s016ms (1s762ms)
     - First Batch Requested: 2s016ms (5.231us)
     - First Batch Returned: 2s017ms (497.212us)
     - Last Batch Returned: 2s186ms (169.010ms)
     - Closed: 2s190ms (3.570ms)
   - PeakMemoryUsage: 51.75 MB (54263808)
   - ProbeRows: 3 (3)
   - ProbeTime: 0.000ns
   - RowsReturned: 301.15K (301146)
   - RowsReturnedRate: 157.24 K/sec
  Nested Loop Join Builder:(Total: 753.036ms, non-child: 753.036ms, % non-child: 100.00%)
     - BuildRows: 100.38K (100382)
     - PeakMemoryUsage: 51.72 MB (54235136)

Change-Id: I604075a2c8efcff26705fb39672f29f309b2ed97
Reviewed-on: http://gerrit.cloudera.org:8080/15663
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Improve join builder profiles
> -----------------------------
>
>                 Key: IMPALA-9422
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9422
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Bikramjeet Vig
>            Priority: Major
>              Labels: multithreading
>
> We should clean up/improve the join builder profiles for the separate build.
> First, for the separate build, we should ensure that all time spent in the builder is counted against the builder. E.g. calls into public methods like BeginSpilledProbe(). These should be counted as idle time for the actual join implementation, so that we can see that the time is spent in the (serial) builder instead of the (parallel) probe.
> We might need to fix things like Send() being called by RepartitionBuildInput, resulting in double counting.
> Second, we should revisit the assortment of timers - BuildRowsPartitionTime, HashTablesBuildTime, RepartitionTime. Maybe it makes sense to make them child counters of total time to make the relationship clearer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org