You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2010/01/29 00:45:34 UTC

[jira] Created: (HIVE-1117) Make QueryPlan serializable

Make QueryPlan serializable
---------------------------

                 Key: HIVE-1117
                 URL: https://issues.apache.org/jira/browse/HIVE-1117
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao
            Assignee: Zheng Shao
             Fix For: 0.6.0


We need to make QueryPlan serializable so that we can resume the query some time later.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833101#action_12833101 ] 

Jeff Hammerbacher commented on HIVE-1117:
-----------------------------------------

Seems totally unnecessary. What are the advantages over Avro reflection? Are these complex objects?

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1117:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

committed. Thanks Zheng

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834113#action_12834113 ] 

Zheng Shao commented on HIVE-1117:
----------------------------------

By using the deserialized plan, we can enforce that the plan (after serialization/deserialzation loop) still works for all unit tests.

In the next transaction (HIVE-1100), I plan to completely break compile and execute apart, so that the serialization happens in compile, and deserialization happens in execute.


> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834106#action_12834106 ] 

Namit Jain commented on HIVE-1117:
----------------------------------

Why are you using the deserialized plan in Driver.compile() ? Is it for testing and do you plan to remove it after some time ?
Otherwise, it looks good.

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1117:
-----------------------------

    Status: Open  (was: Patch Available)

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1117:
-----------------------------

    Attachment: HIVE-1119.1.patch

This patch makes all tasks and works serializable.

Currently there are no additional tests.
Once the whole QueryPlan is serializable, we will serialize the whole query plan after compilation and deserialize the whole plan before execution. This will automatically test all the serialization/deserialization.

I prefer to commit this one first given that major code refactoring/clean-up is happening.


> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1119.1.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832851#action_12832851 ] 

Zheng Shao commented on HIVE-1117:
----------------------------------

A link for XMLEncoder that is used in Hive to serialize/deserialize execution plans: http://java.sun.com/products/jfc/tsc/articles/persistence4/ 

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1117:
-----------------------------

    Status: Patch Available  (was: Open)

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1117:
-----------------------------

    Attachment:     (was: HIVE-1119.1.patch)

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834363#action_12834363 ] 

Zheng Shao commented on HIVE-1117:
----------------------------------

Did you apply both HIVE-1117.test.patch and HIVE-1117.code.patch?

Zheng on blackberry



> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833117#action_12833117 ] 

Ashish Thusoo commented on HIVE-1117:
-------------------------------------

What would be the advantage to use Avro here? We do not really have a requirement of cross language clients for this thing? To me throwing Avro in the mix is just adding another dependency that is not really needed.. no?

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833168#action_12833168 ] 

Edward Capriolo commented on HIVE-1117:
---------------------------------------

Personally, I think the Java XML Serialization is very cool. It is very good for dumping complex object graphs, and you usually need to write 0 lines of de-serialization re-serialization code. Not the tightest XML but its a very generic format. I does not seem like this portion of the code will need to be high performance or cross platform, that would be the argument for avro correct? 

As for JSON, well I guess I am an old timer I still like XML, anyone going to make the case for using s-expressions :)

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834360#action_12834360 ] 

Namit Jain commented on HIVE-1117:
----------------------------------

TestParse failed - can you update the outputs for TestParse results ?

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1117:
-----------------------------

    Status: Patch Available  (was: Open)

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1119.1.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834372#action_12834372 ] 

Namit Jain commented on HIVE-1117:
----------------------------------

+1

will commit if the tests pass

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828035#action_12828035 ] 

Jeff Hammerbacher commented on HIVE-1117:
-----------------------------------------

You could use the Avro JSON Encoder to serialize the data as JSON while still using a schema to describe it (and potentially evolve it).

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828028#action_12828028 ] 

Zheng Shao commented on HIVE-1117:
----------------------------------

We have been using Java serialization with xml format for the map-reduce plans in Hive.
For now, we will continue to use the xml format to ease debugging.

For Avro, shall we first get it working with SerDe?

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806650#action_12806650 ] 

Jeff Hammerbacher commented on HIVE-1117:
-----------------------------------------

I don't see a patch? Would be curious to see the serialization used--Avro would be an excellent choice.

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1117:
-----------------------------

    Attachment: HIVE-1117.1.test.patch
                HIVE-1117.1.code.patch

> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1117) Make QueryPlan serializable

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833118#action_12833118 ] 

Zheng Shao commented on HIVE-1117:
----------------------------------

Hi Jeff, Hive was using the XMLEncoder for the plan from the beginning. I don't see a strong reason that we need to switch now. Maybe we can revisit it at some time later when there is a need.

Also, I have a question for Avro: does it support circular dependency in the object graph? For example, a plan object that contains a field which is an operator object, and the operator object contains a field which stores the plan object.


> Make QueryPlan serializable
> ---------------------------
>
>                 Key: HIVE-1117
>                 URL: https://issues.apache.org/jira/browse/HIVE-1117
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> We need to make QueryPlan serializable so that we can resume the query some time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.