You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Thomas Tauber-Marshall (JIRA)" <ji...@apache.org> on 2017/09/28 15:39:00 UTC

[jira] [Resolved] (IMPALA-5870) Partial sort profile counters don't make sense for partial sort

     [ https://issues.apache.org/jira/browse/IMPALA-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Tauber-Marshall resolved IMPALA-5870.
--------------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

commit 4d49099a8bbea3f24f53272f321a19266dc932b8
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
Date:   Thu Sep 21 12:04:25 2017 -0700

    IMPALA-5870: Improve runtime profile for partial sort
    
    A recent change (IMPALA-5498) added the ability to do partial sorts,
    which divide their input up into runs each of which is sorted
    individually, avoiding the need to spill. Some of the debug output
    wasn't updated vs. regular sorts, leading to confusion.
    
    This patch removes the counters 'SpilledRuns' and 'MergesPerformed'
    since they will always be 0, and it renames the 'IntialRunsCreated'
    counter to 'RunsCreated' since the 'Initial' refers to the fact that
    in a regular sort those runs may be spilled or merged.
    
    It also adds a profile info string 'SortType' that can take the values
    'Total', 'TopN', or 'Partial' to reflect the type of exec node being
    used.
    
    Example profile snippet for a partial sort:
    SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
     SortType: Partial
     ExecOption: Codegen Enabled
        - NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1)
        - InMemorySortTime: 34.201us
        - PeakMemoryUsage: 2.02 MB (2117632)
        - RowsReturned: 44 (44)
        - RowsReturnedRate: 109.11 K/sec
        - RunsCreated: 1 (1)
        - SortDataSize: 572.00 B (572)
    
    Testing:
    - Manually ran several sorting queries and inspected their profiles
    - Updated a kudu_insert test that relied on the 'SpilledRuns' counter
      to be 0 for a partial sort.
    
    Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
    Reviewed-on: http://gerrit.cloudera.org:8080/8123
    Reviewed-by: Tim Armstrong <ta...@cloudera.com>
    Tested-by: Impala Public Jenkins

> Partial sort profile counters don't make sense for partial sort
> ---------------------------------------------------------------
>
>                 Key: IMPALA-5870
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5870
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Matthew Jacobs
>            Assignee: Thomas Tauber-Marshall
>              Labels: bugbash-2017-08-30, kudu, supportability
>             Fix For: Impala 2.11.0
>
>
> The profile counters for the partial sort don't all make sense:
> {code}
>       SORT_NODE (id=2):(Total: 13s492ms, non-child: 2s098ms, % non-child: 15.55%)
>          - InMemorySortTime: 880.839ms
>          - InitialRunsCreated: 1 (1)
>          - PeakMemoryUsage: 93.48 MB (98016393)
>          - RowsReturned: 804.95K (804951)
>          - RowsReturnedRate: 59.53 K/sec
>          - SortDataSize: 89.40 MB (93742171)
>          - SpilledRuns: 0 (0)
>          - TotalMergesPerformed: 0 (0)
> {code}
> We should probably indicate this is a partial sort, and consider:
> * rename/remove InitialRunsCreated/SpilledRuns - we care here about the number of partial sorts, not sure how to phrase that
> * check if this makes sense: SortDataSize
> * remove TotalMergesPerformed
> We should also address misleading info in the plan (in addition to the profile mentioned above). See [~jyu@cloudera.com]'s comment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)