You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/11/10 04:57:00 UTC

[jira] [Commented] (IMPALA-3816) Codegen perf-critical loops in Sorter

    [ https://issues.apache.org/jira/browse/IMPALA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228970#comment-17228970 ] 

ASF subversion and git services commented on IMPALA-3816:
---------------------------------------------------------

Commit 1337adff6cc0c59a0f066b5c9a64352d929627f2 in impala's branch refs/heads/master from Qifan Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1337adf ]

IMPALA-3816: Codegen perf critical loops in Sorter

This fix added the functionality to codegen recursive method
Sorter::TupleSorter::SortHelper() in sorter, which improves the
performance for both the sort and the partial sort operators.

In one unit test to order 7300 rows from table functional.alltypes,
the speedup of the code-gen version over non-code-gen version of
the method is around 65%. In another unit test to partially
order 2880404 rows, the speedup is around 61%.

Testing:
1. Unit testing;
2. Ran Core tests successfully.

Change-Id: Ie08137449d4a7b554ca8b8650260f8bd72e0a81b
Reviewed-on: http://gerrit.cloudera.org:8080/16621
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Codegen perf-critical loops in Sorter
> -------------------------------------
>
>                 Key: IMPALA-3816
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3816
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.7.0
>            Reporter: Tim Armstrong
>            Assignee: Qifan Chen
>            Priority: Minor
>              Labels: codegen
>         Attachments: 7e455a394bf2622b_adcea7ad00000000_opt.ll, percentile query profile.txt, tpch_30.txt
>
>
> In the sorter, we codegen the comparator function but call it indirectly via a function pointer. We should consider codegening the perf-critical loops so that we can make the comparator function call direct and inlinable. Inlining the comparison will be very beneficial if it is trivial, e.g. order by a numeric column: I expect sorts on simple keys will get noticably faster.
> We should also be able to get rid of FreeLocalAllocations() calls for most comparators, although I'm not sure what the best way to approach that is.
> The Partition() loop is the most perf-critical, followed by InsertionSort().
> We also don't do this yet for the TopN node, see IMPALA-3815.
> Mostafa's analysis:
> While evaluating Sort performance I noticed that the codegened compare function is not inlined which results in large overhead per row. 
> Expected speedup is 10-15%
> {code}
>   /// Returns a negative value if lhs is less than rhs, a positive value if lhs is
>   /// greater than rhs, or 0 if they are equal. All exprs (ordering_exprs_lhs_ and
>   /// ordering_exprs_rhs_) must have been prepared and opened before calling this,
>   /// i.e. 'sort_key_exprs' in the constructor must have been opened.
>   int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {
>     return codegend_compare_fn_ == NULL ?
>         CompareInterpreted(lhs, rhs) :
>         (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),
>             ordering_expr_evals_rhs_.data(), lhs, rhs);
>   } 
> {code}
> From Perf
> {code}
>       │    bool Sorter::TupleSorter::Less(const TupleRow* lhs, const TupleRow* rhs) {                                                                                                                                                           ▒
>   7.43 │      push   %rbp                                                                                                                                                                                                                        ▒
>   3.23 │      mov    %rsp,%rbp                                                                                                                                                                                                                   ▒
>   9.44 │      push   %r12                                                                                                                                                                                                                        ▒
>   2.69 │      push   %rbx                                                                                                                                                                                                                        ▒
>   3.89 │      mov    %rsi,%r12                                                                                                                                                                                                                   ▒
>   2.98 │      mov    %rdi,%rbx                                                                                                                                                                                                                   ▒
>   6.06 │      sub    $0x10,%rsp                                                                                                                                                                                                                  ◆
>        │      --num_comparisons_till_free_;                                                                                                                                                                                                      ▒
>        │      DCHECK_GE(num_comparisons_till_free_, 0);                                                                                                                                                                                          ▒
>        │      if (UNLIKELY(num_comparisons_till_free_ == 0)) {                                                                                                                                                                                   ▒
>   3.75 │      subl   $0x1,0x18(%rdi)                                                                                                                                                                                                             ▒
>   9.42 │    ↓ je     58                                                                                                                                                                                                                          ▒
>        │        parent_->expr_results_pool_.Clear();                                                                                                                                                                                             ▒
>        │        num_comparisons_till_free_ = state_->batch_size();                                                                                                                                                                               ▒
>        │      }                                                                                                                                                                                                                                  ▒
>        │      return comparator_.Less(lhs, rhs);                                                                                                                                                                                                 ▒
>   1.09 │17:   mov    0x10(%rbx),%rdi                                                                                                                                                                                                             ▒
>        │      /// Returns a negative value if lhs is less than rhs, a positive value if lhs is                                                                                                                                                   ▒
>        │      /// greater than rhs, or 0 if they are equal. All exprs (ordering_exprs_lhs_ and                                                                                                                                                   ▒
>        │      /// ordering_exprs_rhs_) must have been prepared and opened before calling this,                                                                                                                                                   ▒
>        │      /// i.e. 'sort_key_exprs' in the constructor must have been opened.                                                                                                                                                                ▒
>        │      int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {                                                                                                                                                        ▒
>        │        return codegend_compare_fn_ == NULL ?                                                                                                                                                                                            ▒
>   2.69 │      mov    0x58(%rdi),%rax                                                                                                                                                                                                             ▒
>        │            CompareInterpreted(lhs, rhs) :                                                                                                                                                                                               ▒
>        │            (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),                                                                                                                                                                     ▒
>        │                ordering_expr_evals_rhs_.data(), lhs, rhs);                                                                                                                                                                              ▒
>   5.43 │      test   %rax,%rax                                                                                                                                                                                                                   ▒
>        │    ↓ je     40                                                                                                                                                                                                                          ▒
>   6.85 │      mov    0x20(%rdi),%rsi                                                                                                                                                                                                             ▒
>   0.86 │      mov    %rdx,%rcx                                                                                                                                                                                                                   ▒
>   2.55 │      mov    0x8(%rdi),%rdi                                                                                                                                                                                                              ▒
>   3.38 │      mov    %r12,%rdx                                                                                                                                                                                                                   ▒
>   6.10 │    → callq  *(%rax)                                                                                                                                                                                                                     ▒
>        │    }                                                                                                                                                                                                                                    ▒
>   5.84 │      add    $0x10,%rsp                                                                                                                                                                                                                  ▒
>        │      /// All exprs (ordering_exprs_lhs_ and ordering_exprs_rhs_) must have been prepared                                                                                                                                                ▒
>        │      /// and opened before calling this.                                                                                                                                                                                                ▒
>        │      /// Force inlining because it tends not to be always inlined at callsites, even in                                                                                                                                                 ▒
>        │      /// hot loops.                                                                                                                                                                                                                     ▒
>        │      bool ALWAYS_INLINE Less(const TupleRow* lhs, const TupleRow* rhs) const {                                                                                                                                                          ▒
>        │        return Compare(lhs, rhs) < 0;                                                                                                                                                                                                    ▒
>   1.77 │      shr    $0x1f,%eax                                                                                                                                                                                                                  ▒
>   7.91 │      pop    %rbx                                                                                                                                                                                                                        ▒
>   4.11 │      pop    %r12                                                                                                                                                                                                                        ▒
>   0.49 │      pop    %rbp                                                                                                                                                                                                                        ▒
>   1.75 │    ← retq                                                                                                                                                                                                                               ▒
>        │      /// i.e. 'sort_key_exprs' in the constructor must have been opened.                                                                                                                                                                ▒
>        │      int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {                                                                                                                                                        ▒
>        │        return codegend_compare_fn_ == NULL ?                                                                                                                                                                                            ▒
>        │            CompareInterpreted(lhs, rhs) :                                                                                                                                                                                               ▒
>        │            (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),                                                                                                                                                                     ▒
>        │                ordering_expr_evals_rhs_.data(), lhs, rhs);                                                                                                                                                                              ▒
>        │40:   mov    %r12,%rsi                                                                                                                                                                                                                   ▒
>        │    → callq  impala::TupleRowComparator::CompareInterpreted(impala::TupleRow const*, impala::TupleRow const*) const                                                                                                                      ▒
>        │      add    $0x10,%rsp                                                                                                                                                                                                                  ▒
>        │      /// All exprs (ordering_exprs_lhs_ and ordering_exprs_rhs_) must have been prepared                                                                                                                                                ▒
> Press 'h' for help on key bindings
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org