You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/11/20 18:36:00 UTC

[jira] [Resolved] (IMPALA-5976) Remove equivalence classes

     [ https://issues.apache.org/jira/browse/IMPALA-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tianyi Wang resolved IMPALA-5976.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0


Reply
More  
IMPALA-5976: Remove equivalence class computation in FE

Equivalent class is used to get the equivalencies between slots. It is
ill-defined and the current implementation is inefficient. This patch
removes it and directly uses the information from the value transfer
graph instead.
Value transfer graph is reimplemented using Tarjan's strongly connected
component algorithm and BFS with adjacency lists to speed up on both
condensed and sparse graphs.

Testing: It passes the existing tests. In planner tests the equivalence
between SCC-condensed graph and uncondensed graph is checked. A test
case is added for a helper class IntArrayList. An outer-join edge case
is added in planner test. On a query with 1800 union operations, the
equivalence class computation time is reduced from 7m57s to 65ms and the
planning time is reduced from 8m5s to 13s.

Change-Id: If4cb1d8be46efa8fd61a97048cc79dabe2ffa51a
Reviewed-on: http://gerrit.cloudera.org:8080/8317
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins

> Remove equivalence classes
> --------------------------
>
>                 Key: IMPALA-5976
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5976
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: Alexander Behm
>            Assignee: Tianyi Wang
>            Priority: Critical
>             Fix For: Impala 2.11.0
>
>         Attachments: queryA.sql, queryB.sql
>
>
> The simplest solution to speed up the equivalence-class computation is to remove the equivalence class altogether.
> A few points for justification:
> * The purpose of the equivalence classes is to provide a condensed version of the value-transfer graph to speed up certain equivalence checks/transformations. All necessary information is already captured in the value-transfer graph, so the high cost of computing the equivalence classes is not justified.
> * Incomplete/wrong definition of equivalence classes. In today's code, the equivalence class of a slot contains a list of slots where each pair of slots has a value transfer. This definition does not make sense, in particular, for unions which produce value transfers of the form s1 -> s2, s1 -> s3 where according to our definition s1 can only be in the same class as s2 or s3, but not both (because s2 and s3 do not have a value transfer). We could redefine the equivalence class of only containing slot pairs with mutual value transfers, but that information is already efficiently accessible from the value-transfer graph.
> The challenge in removing the equivalence classes is to rework all the existing code that relies on them to use the value-transfer graph directly.
> One of the tricky functions that needs to change is Analyzer.equivExprs(). An alternative implementation might look like this:
> {code}
> // Returns true if 'other' is equivalent to this. Equivalence means that the two expr trees are equal except for SlotRefs which match if there
> // is another SlotRef in the other expr tree at the same position, and the underlying slots have a value transfer. If 'mutualOnly' is true,
> // SlotRefs are only considered equivalent if they there is a mutual value transfer.
> boolean Expr.isEquiv(Expr other, boolean mutualOnly);
> {code}
> The existing equals() function can probably be refactored into a new function that can be called from both equals() and isEquiv() or something along those lines.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)