You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/08/04 20:09:18 UTC

[jira] Created: (HIVE-1511) Hive plan serialization is slow

Hive plan serialization is slow
-------------------------------

                 Key: HIVE-1511
                 URL: https://issues.apache.org/jira/browse/HIVE-1511
             Project: Hadoop Hive
          Issue Type: Improvement
    Affects Versions: 0.7.0
            Reporter: Ning Zhang


As reported by Edward Capriolo:

For reference I did this as a test case....
SELECT * FROM src where
key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
OR key=0 OR key=0 OR key=0 OR
key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
OR key=0 OR key=0 OR key=0 OR
...(100 more of these)

No OOM but I gave up after the test case did not go anywhere for about
2 minutes.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1511) Hive plan serialization is slow

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895343#action_12895343 ] 

Ning Zhang commented on HIVE-1511:
----------------------------------

The issue seems to be the fact that we serialize the plan by writing to HDFS file directly. We probably should cache it locally and then write it to HDFS. 

> Hive plan serialization is slow
> -------------------------------
>
>                 Key: HIVE-1511
>                 URL: https://issues.apache.org/jira/browse/HIVE-1511
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>
> As reported by Edward Capriolo:
> For reference I did this as a test case....
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1511) Hive plan serialization is slow

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895352#action_12895352 ] 

Edward Capriolo commented on HIVE-1511:
---------------------------------------

Also possibly a clever way to remove duplicate expressions that evaluate to the same result such as multiple key=0

> Hive plan serialization is slow
> -------------------------------
>
>                 Key: HIVE-1511
>                 URL: https://issues.apache.org/jira/browse/HIVE-1511
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>
> As reported by Edward Capriolo:
> For reference I did this as a test case....
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.