You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2008/09/05 19:40:44 UTC

[jira] Created: (HADOOP-4084) Add explain plan capabilities to Hive QL

Add explain plan capabilities to Hive QL
----------------------------------------

                 Key: HADOOP-4084
                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
             Project: Hadoop Core
          Issue Type: New Feature
          Components: contrib/hive
            Reporter: Ashish Thusoo
            Assignee: Ashish Thusoo


Adding explain plan for queries in hive.

The current proposal is to support something like:

EXPLAIN [EXTENDED]
SELECT ....

This will output the following:

Abstract Syntax Tree:

Number of Stages:

Dependencies between Stages:

Plan for each stage:

If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.

e.g. In case of a group by query 

EXPLAIN
SELECT T.c1, count(1) FROM T GROUP BY T.c1;

The explain plan itself has two stages

Stage1 and Stage2

Stage1 will have the plan for generating the partial aggregates
and Stage2 will have the plan for generating the complete aggregates.

I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633304#action_12633304 ] 

Hudson commented on HADOOP-4084:
--------------------------------

Integrated in Hadoop-trunk #611 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/])

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631961#action_12631961 ] 

Namit Jain commented on HADOOP-4084:
------------------------------------

+1

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631959#action_12631959 ] 

Ashish Thusoo commented on HADOOP-4084:
---------------------------------------

looked at this with Dhrubha...

seems like hadoopQA is not able to handle deleted files. UDAFRegistry.java and UDFRegistry.java have been deleted. But the patch command converts the deletes into empty files and does not actually remove the files, hence the failure. Is there a more permanent solution to this so that the patch command is able to delete files. Maybe we should upload the svn stat output as well to help in this.

Thoughts?


> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632126#action_12632126 ] 

Hadoop QA commented on HADOOP-4084:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12390313/patch-4084_v3
  against trunk revision 696551.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 750 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3297/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3297/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3297/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3297/console

This message is automatically generated.

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-4084:
----------------------------------

    Attachment: patch-4084_v3

New patch after resolving merge conflicts.

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-4084:
----------------------------------

    Attachment: patch-4084

Uploaded patch for support to explain plan and some other minor fixes and cleanups.

A note on the implementation of explain plan. This is implemented through a java annotation @explain (implemented in explain.java)

This annotation can take two optional arguments

@explain(displayName="name", normalExplain=false)

by default displayName is "" and normalExplain is true

displayName is the string used to display this class/return value of a method in the plan. normalExplain=false means that this class/return value of a method should only be displayed in case of an extended plan.

Additionally there is an ExplainTask that does the actual explain and contains the explainWork that contains the AST string and the rootTasks that need to be explained.

The general format of explain is

PARSE TREE

STAGE DEPENDENCIES

STAGE PLAN:
  Plan for Stage 1
 
  Plan for Stage 2 
.
.
.

Within the plan the parent child relationship is shown through indentation, the names of operators are displayed as specified in the @explain notation (if the displayName is "" then the name is not displayed in the plan).
Additional each of the public functions of the class that are annotated with @explain is called and the explain is recursively called on non primitive and string values (if they are maps, lists or other classes). For primitive and string values
we just print the value.

In future I can make this xml instead of text blobs that I am doing now.

The minor fixes/refactors include:
1. -Doverwrite=true option for running tests with the purpose of capturing results, so now if you have to capture the results of TestCliDriver you can run
            ant -Dtestcase=TestCliDriver -Doverwrite=true clean-test test  

      same is true for TestParse and TestParseNegative
      Additionally for all these tests you can run a specific query by using -Dqfile e.g.

           ant -Dtestcase=TestParse -Dqfile=input1.q -Doverwrite=true clean-test test

    would run the parse test on input1.q and capture its output in the source tree.

2. Fixed some warnings related to generics

3. Changed the location of velocity.log to be in the build directory for TestCliDriver (this was already in the build location for TestParse and TestParseNegative)

4. Unified the function registries for UDF and UDAF and introduced the notion of displayName in FunctionInfo and FunctionRegistry which is used to show the function in the plan.



> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>         Attachments: patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631948#action_12631948 ] 

Ashish Thusoo commented on HADOOP-4084:
---------------------------------------

I don;t know why this is not compiling. I checked the submitted patch and the references to UDAFRegistry have actually been removed from the sources. Is this a merge problem?

The part of the patch file that does this is as follows:

Index: src/contrib/hive/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
===================================================================
--- src/contrib/hive/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java  (revision 695984)
+++ src/contrib/hive/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java  (working copy)
@@ -134,7 +134,7 @@
       assert (expressionTree.getChildCount() != 0);
       assert (expressionTree.getChild(0).getType() == HiveParser.Identifier);
       String functionName = expressionTree.getChild(0).getText();
-      if (UDAFRegistry.getUDAF(functionName) != null) {
+      if (FunctionRegistry.getUDAF(functionName) != null) {
         aggregations.put(expressionTree.toStringTree(), expressionTree);
         return;
       }
@@ -987,7 +987,7 @@
     for (Map.Entry<String, CommonTree> entry : aggregationTrees.entrySet()) {
       CommonTree value = entry.getValue();
       String aggName = value.getChild(0).getText();
-      Class<? extends UDAF> aggClass = UDAFRegistry.getUDAF(aggName);
+      Class<? extends UDAF> aggClass = FunctionRegistry.getUDAF(aggName);
       assert (aggClass != null);
       ArrayList<exprNodeDesc> aggParameters = new ArrayList<exprNodeDesc>();
       ArrayList<Class<?>> aggClasses = new ArrayList<Class<?>>();
@@ -1006,7 +1006,7 @@
         aggClasses.add(paraExprInfo.getType().getPrimitiveClass());
       }

-      if (null == UDAFRegistry.getUDAFMethod(aggName, aggClasses)) {
+      if (null == FunctionRegistry.getUDAFMethod(aggName, aggClasses)) {
         String reason = "Looking for UDAF \"" + aggName + "\" with parame

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-4084:
----------------------------------

    Status: Open  (was: Patch Available)

the previous patch had merge conflicts as it was generated from a previous version of trunk and trunk has moved since...

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-4084:
----------------------------------

    Fix Version/s: 0.19.0
     Hadoop Flags: [Reviewed]
           Status: Patch Available  (was: Open)

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631466#action_12631466 ] 

Namit Jain commented on HADOOP-4084:
------------------------------------


Overall, looks good - just some minor comments.

LoadSemanticAnalyzer: line 32/33 - nitpick: remove commented import 
ExplainSemanticAnalyzer (line 47): not needed 
explain.java: no apache license header on top of file 
DDLWork.java: why display name only for create table ? why not for other DDLs 
MapRedTask: (line 91) remove explain() 
MoveTask: remove explain()




> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>         Attachments: patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631973#action_12631973 ] 

Ashish Thusoo commented on HADOOP-4084:
---------------------------------------

Actually I found the problem. 

This patch conflicted with a checkin that was made by namit so that UDAFRegistry reference was added into SemanticAnalyzer and my patch which was generated from a previous version of the tree did not have those changes. Will fix and submit an new patch.



> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4084:
------------------------------------

    Release Note: Introduced "EXPLAIN" plan for Hive.  (was: Explain plan for hive)

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-4084:
----------------------------------

    Status: Patch Available  (was: Open)

New patch with resolved merge conflicts.

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631926#action_12631926 ] 

Hadoop QA commented on HADOOP-4084:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12390212/patch-4084
  against trunk revision 696427.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 750 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3287/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3287/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3287/console

This message is automatically generated.

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631779#action_12631779 ] 

Ashish Thusoo commented on HADOOP-4084:
---------------------------------------

Hey namit...

I did not realize that you have not given a +1 on this yet. Can you take a look?

Thanks,
Ashish

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HADOOP-4084:
----------------------------------

    Attachment: patch-4084

Incorporated Namit's comments.

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4084) Add explain plan capabilities to Hive QL

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4084:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Ashish!

> Add explain plan capabilities to Hive QL
> ----------------------------------------
>
>                 Key: HADOOP-4084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4084
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>             Fix For: 0.19.0
>
>         Attachments: patch-4084, patch-4084, patch-4084_v3
>
>
> Adding explain plan for queries in hive.
> The current proposal is to support something like:
> EXPLAIN [EXTENDED]
> SELECT ....
> This will output the following:
> Abstract Syntax Tree:
> Number of Stages:
> Dependencies between Stages:
> Plan for each stage:
> If EXTENDED keyword is used then much more information will be emitted where as without that keyword only logical information will be emitted.
> e.g. In case of a group by query 
> EXPLAIN
> SELECT T.c1, count(1) FROM T GROUP BY T.c1;
> The explain plan itself has two stages
> Stage1 and Stage2
> Stage1 will have the plan for generating the partial aggregates
> and Stage2 will have the plan for generating the complete aggregates.
> I also plan to convert the parse and semantic analysis tests so that they use this for finding differences in the plan instead of the programmatic plan dumps that we are using today (tests/queries/positive).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.