You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2009/01/10 16:25:59 UTC

[jira] Created: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
------------------------------------------------------------------------------------------------------------------------------

                 Key: HIVE-222
                 URL: https://issues.apache.org/jira/browse/HIVE-222
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Ashish Thusoo
            Assignee: Ashish Thusoo


For queries of the form (groupby2_map.q in the source)

SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x

when map side aggregation is on 

hive.map.aggr=true (This is off by default)

The following exception can occur:
    [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
    [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
    [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
    [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
    [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-222:
-------------------------------

    Attachment: patch-222.txt

Fix for the bug.

There was a bug in way the the aggregation list was being generated for the map side aggregation. As a result the ordering of the aggregations in the map side groupby operator and the reduce side groupby operator would differ leading to this problem. Ideally, we should be using the row schema information to generate the order but that needs a much larger refactor of  how we generate plans in the group by case. For now this patch should fix the problem.

There are prexisting tests that test this (groupby2_map.q and groupby3_map.q). The test case however relies on an internal hashmap giving the keys in a certain order. The bug was easily reproducible with the patch in HIVE-179. I have tested it with that patch.


> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-222:
-------------------------------

             Priority: Blocker  (was: Major)
    Affects Version/s: 0.2.0
        Fix Version/s: 0.2.0

> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-222:
----------------------------

      Resolution: Fixed
    Release Note: HIVE-222. Fixed Group by on a combination of disitinct and non distinct aggregates. (Ashish Thusoo via zshao)
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed revision 734008. Thanks Ashish!


> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "David Phillips (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664854#action_12664854 ] 

David Phillips commented on HIVE-222:
-------------------------------------

Is there any value to adding the testcase from HIVE-215?

> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-222:
-------------------------------

    Status: Patch Available  (was: Open)

submitting patch.

> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663056#action_12663056 ] 

Prasad Chakka commented on HIVE-222:
------------------------------------

looks good +1

though it doesn't guarantee that HashMap will return objects in the same order in both the functions. 

> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-222) Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-222:
--------------------------------

        Fix Version/s: 0.3.0
                           (was: 0.6.0)
    Affects Version/s:     (was: 0.6.0)

> Group by on a combination of disitinct and non distinct aggregates can return serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.