You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Carl Steinbach (JIRA)" <ji...@apache.org> on 2009/11/11 01:16:28 UTC

[jira] Created: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
------------------------------------------------------------------------

                 Key: HIVE-924
                 URL: https://issues.apache.org/jira/browse/HIVE-924
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Carl Steinbach


Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785474#action_12785474 ] 

Ning Zhang commented on HIVE-924:
---------------------------------

Carl, I'm looking at the patch but I got some errors when applying to the latest revision. Since this is quite a large patch and touches some files that are changed frequently. Can you explain your design in more detail first and then update the patch with the latest revision?

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789583#action_12789583 ] 

Carl Steinbach commented on HIVE-924:
-------------------------------------

Hi Zheng,

My thoughts exactly.

> desc + DAG: these are the information that needs to be propagated from compile time to execution time. We should put them together and give it a new name.

One option is to call the new runtime desc+DAG combination an OpDesc, 
e.g. the combination of selectDesc and SelectOperator would become SelectOpDesc. What do you think?


> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784310#action_12784310 ] 

Ning Zhang commented on HIVE-924:
---------------------------------

Hi Carl,  sorry that we have not got enough cycles to review this patch in time. We will do it soon.. Before doing that, there are some conflicts when applying this patch to the latest revision. Can you update it to the latest revision? One general questions based on your description: currently SemanticAnalyzer handles query, DML and DDL. Is there any reason that you only split SemanticAnalyzer to DMLSemanticAnalyzer, not the other two?


> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788892#action_12788892 ] 

Zheng Shao commented on HIVE-924:
---------------------------------

Carl, some notes about our meeting today:

Currently we have:
1. ParseContext: compile-time thing only
2. Operators: contains the desc, the DAG, and the transient variables for execution-time only.
3. desc: the description of the operator

I am in favor of an organization like this:
1. CompileTimeOperator: contains information in ParseContext, and desc + DAG.
2. desc + DAG: these are the information that needs to be propagated from compile time to execution time. We should put them together and give it a new name.
3. Operators: for execution time only, constructed from the desc + DAG.

In this way, only desc+DAG is shared between the compile time and execution time and the code can look much cleaner.
Does this look good?


> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789961#action_12789961 ] 

Carl Steinbach commented on HIVE-924:
-------------------------------------

My preference is to do this in a separate patch, but to commit both patches
within a small window of time. All of these refactoring changes affect a large
number of files, and are disruptive to others since they result in a lot of
diffs. Pushing them in together helps to minimize the disruption.

However, if you think the risk of destabilization outweighs the inconvenience
of dealing with lots of diffs twice, then I am fine with waiting a while between
these two changes.


> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach reassigned HIVE-924:
-----------------------------------

    Assignee: Carl Steinbach

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789599#action_12789599 ] 

Namit Jain commented on HIVE-924:
---------------------------------

I feel separation of operators into compile and run-time can be done in a different patch. This patch is potentially risky, and may destabilize the stuff.
By doing it in 2 patches, we can potentially minimize the risk.

@Carl, what do you think ? I talked to Zheng and he does not have a strong preference

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784699#action_12784699 ] 

Ning Zhang commented on HIVE-924:
---------------------------------

I like the idea of refactoring the LogicalPlan and PhysicalPlan in general, but since it is too large, can you separate it into several small JIRAs. e.g., 1 JIRA for PhysicalPlan and PhysicalPlan generator and 1 JIRA for LogicalPlan and its generator, and 1 for the rest of misc changes. It is more managable to review and have less chance of conflicting with other patches. 

Some detailed comments:

1) the transform() function now returns void instead of ParseContext/LogicalPlan. This may be OK in the current implementation, but in general if we want to extend the rule-based rewrite system to cost based one, we probably need transform return a different LogicalPlan (the rewritten one) and keep the input plan unchanged. Then we can cost them. The optimize() function is similar, it should return the best plan given an input plan in the cost-based rewrite framework.

2) I noticed that you removed some code and had a comment like "possible bug". Can you come up with some unit tests for such cases that the old code cause bugs? I've seen some diff in the unit test results under this patch, so I'm wondering whether they are caused by these changes. 

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789588#action_12789588 ] 

Zheng Shao commented on HIVE-924:
---------------------------------

SelectOperator contains a lot of transient variables. And selectDesc is a member variable of SelectOperator right now.

I think OpDesc should have those descs, plus the children information. OpDesc should NOT have the transient variables.

Is that good?



> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784473#action_12784473 ] 

Ning Zhang commented on HIVE-924:
---------------------------------

Yes, I can use that revision # to apply your patch.

The CTAS was originally added to DDLSemanticAnalyzer but due to some difficulties, we move it to SemanticAnalyzer. The main reason is that the we need to analyze the select part first to get the schema of the target table and the operator tree, and use that schema to change the last moveTask and the metastore call to create the table. If we do it separately, the DDLSemanticAnalyzer need to recursively call SemanticAnalyzer and workthrough the intermediate result (operator tree) from the output of the SemanticAnalyzer to change the move task and add the DDLTask, and then call the rest of SemanticAnalyzer to generate mapredWork. That makes it more complex and error prone. I think with your refactoring SemanticAlayzer into LogicalPlan and PhysicalPlan, it will make it easier for separating CTAS to DDLSemanticAnalyzer.  

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-924:
--------------------------------

    Attachment: HIVE-924.patch


- Split SemanticAnalyzer into DMLSemanticAnalyzer, LogicalPlan, LogicalPlanGenerator, PhysicalPlan, and PhysicalPlanGenerator classes.

- Removed ParseContext class.

- Refactored GenMRProcContext.

- Fixed several cases of classes that were exposing their internal mutable state.

TODO:

- Consolidate everything under a QueryCompiler class.


> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789972#action_12789972 ] 

Namit Jain commented on HIVE-924:
---------------------------------

OK, let us do it in 2 patches - but do them quickly. 

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784455#action_12784455 ] 

Carl Steinbach commented on HIVE-924:
-------------------------------------

I wanted to add that the current rationale for having (DML)SemanticAnalyzer handle CREATE TABLE is so that it can easily do CTAS. I think it would be cleaner to handle CTAS as a special case requiring both DDLSemanticAnalyzer and DMLSemanticAnalyzer instead of putting all of the CREATE TABLE logic in the SemanticAnalyzer class.

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783823#action_12783823 ] 

Jeff Hammerbacher commented on HIVE-924:
----------------------------------------

Hey,

Has anyone gotten a chance to look at this patch? It's a doozy. and will likely spawn multiple issues, so it would be great to get the reviews started soon. I think the patch has significant value for the Hive codebase, so I'd love to see someone check it out soon!

Thanks,
Jeff

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784448#action_12784448 ] 

Carl Steinbach commented on HIVE-924:
-------------------------------------

Hi Ning,

This is a really big patch that touches a lot of different files.  I don't think it is practical to try to push it in as one big change, and consequently I have not invested time trying to keep it in sync with trunk. If folks think these changes are worth making I was planning to split this JIRA into several (many?) smaller JIRAs and push them in over the course of a week, but I wanted to first give the reviewers an overview of all of the changes and final destination I'm trying to reach.

The current patch is based on trunk@836131.  Is it possible for you to review the patch against this tag?

Currently SemanticAnalyzer handles all DML and one DDL statement (CREATE TABLE), and DDLSemanticAnalyzer handles the remaining DDL statements (DROP TABLE, ALTER TABLE, etc). I changed the name from SemanticAnalyzer to DMLSemanticAnalyzer because 1) its main purpose is processing DML, and 2) I think we should try to move the CREATE TABLE code to DDLSemanticAnalyzer.

> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-924) Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789595#action_12789595 ] 

Carl Steinbach commented on HIVE-924:
-------------------------------------

> One option is to call the new runtime desc+DAG combination an OpDesc, 

Sorry, I meant to say compile-time instead of run-time.

> OpDesc should NOT have the transient variables.

Agreed!


> Extract LogicalPlan and PhysicalPlan classes from SemanticAnalysis class
> ------------------------------------------------------------------------
>
>                 Key: HIVE-924
>                 URL: https://issues.apache.org/jira/browse/HIVE-924
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>         Attachments: HIVE-924.patch
>
>
> Currently the SemanticAnalyzer class handles semantic analysis, as well as logical plan generation and physical plan generation. I think it would be beneficial to extract distinct LogicalPlan and PhysicalPlan classes from the SemanticAnalyzer, and have the query processing phase be coordinated by a QueryCompiler class that would be responsible for triggering the parsing, semantic analysis, logical plan generation, optimization, and physical plan generation phases. This proposed reorganization of components would help to isolate the state of each phase, and would also bring the source into closer alignment with the description of the query compiler in the Hive design document on the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.