You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/02/02 02:31:00 UTC

[jira] [Created] (IMPALA-8156) Add format options to the EXPLAIN statement

Paul Rogers created IMPALA-8156:
-----------------------------------

             Summary: Add format options to the EXPLAIN statement
                 Key: IMPALA-8156
                 URL: https://issues.apache.org/jira/browse/IMPALA-8156
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


The EXPLAIN statement is very basic:

{code:sql}
EXPLAIN <stmt>;
{code}

Example:

{code:sql}
EXPLAIN SELECT * FROM alltypes;
{code}

Explain does provide some options set as session options:

{code:sql}
SET set explain_level=extended;
EXPLAIN <stmt>;
{code}

We have often found the need for additional information. For example, it would be very useful to obtain the SELECT statement after view substitution.

We wish to extend EXPLAIN to allow additional options, while retaining full backward compatibility. The extended syntax is:

{code:sql}
EXPLAIN [FORMAT([opt(, opt)*])] <stmt>;
{code}

This syntax reuses the existing FORMAT keyword, and allow an unlimited set of options to be added in the future without the need to define new keywords.

Options are in the {{name=value}} form with {{name}} as an identifier and {{value}} as a string literal. Both are case-insensitive. Example to set the explain level:

{code:sql}
EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
{code}

The two options supported at present are:

* {{level}} - Sets the explain level.
* {{rewritten}} - Shows the fully rewritten SQL statement with views expanded.

The {{level}} option overrides the existing session options. If {{level}} is not present, then the session option is used instead.

The {{rewritten}} option takes two values: {{true}} or {{false}}. If set, {{EXPLAIN}} returns the text of the rewritten SQL instead of the query plan. Example:

{noformat}
functional> explain format(rewritten) SELECT * FROM view_view;

+------------------------------------------------+
| Explain String                                 |
+------------------------------------------------+
| SELECT * FROM /* functional.view_view */ (     |
| SELECT * FROM /* functional.alltypes_view */ ( |
| SELECT * FROM functional.alltypes)             |
| )                                              |
+------------------------------------------------+
{noformat}

Here, the names in comments are the view names. Views are then expanded inline to show the full extend of the statement. This is very helpful to resolve user issues.

h4. Comparison with Other SQL Dialects

The ISO SQL standard does not define the {{EXPLAIN}} statement, it is a vendor extension. MySQL defines {{EXPLAIN}} as:

{noformat}
{EXPLAIN | DESCRIBE | DESC}
    [explain_type]
    {explainable_stmt | FOR CONNECTION connection_id}

explain_type: {
    FORMAT = format_name
}

format_name: {
    TRADITIONAL
  | JSON
}
{noformat}

That is, MySQL also uses the {{FORMAT}} keyword with only two choices.

SqlServer uses a form much like Impala's present form with no options.

Postgres uses options and keywords:

{noformat}
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

    ANALYZE [ boolean ]
    VERBOSE [ boolean ]
    COSTS [ boolean ]
    BUFFERS [ boolean ]
    FORMAT { TEXT | XML | JSON | YAML }
{noformat}

Apache Drill uses a series of keywords to express options:

{noformat}
explain plan [ including all attributes ]
             [ with implementation | without implementation ]
             for <query> ;
{noformat}

We claim that, given the wide variety of vendor implementations, the proposed Impala syntax is reasonable.

h4. Futures

IMPALA-5973 proposes to add a JSON format for {{EXPLAIN}} output. We propose to select JSON output using the "format" option:

{code:sql}
EXPLAIN FORMAT(format='json') <stmt>
{code}

The format can be combined other options such as level:

{code:sql}
EXPLAIN FORMAT(format='json', level='extended') <stmt>
{code}

h4. Details

The key/value syntax is very general, but cumbersome for simple tasks. The {{FORMAT}} option allows a number of simplifications.

First, for the explain level, each level can be used as a Boolean option:

{code:sql}
EXPLAIN FORMAT(extended='true') <stmt>
{code}

Second, for Boolean options, the value is optional and "true" is assumed:

{code:sql}
EXPLAIN FORMAT(EXTENDED) <stmt>
{code}

Third, if only a value is given, the value is assumed to be for the "format" key (which is not yet supported):

{code:sql}
EXPLAIN FORMAT('json') <stmt>
{code}

Would, when JSON format is available, emit the plan as JSON.

The astute reader will see opportunities for odd combinations of options. Rather than enforcing a strict set of rules, when given an odd set of rules, the {{FORMAT}} option simply does something reasonable. Example:

{code:sql}
EXPLAIN FORMAT(level='standard', extended, verbose='false') <stmt>
{code}

The short answer here is that options are ambiguous, behavior is undefined, but reasonable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)