You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Kevin Wilfong (Created) (JIRA)" <ji...@apache.org> on 2012/01/26 03:09:40 UTC

[jira] [Created] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

Hive multi group by single reducer optimization causes invalid column reference error
-------------------------------------------------------------------------------------

                 Key: HIVE-2750
                 URL: https://issues.apache.org/jira/browse/HIVE-2750
             Project: Hive
          Issue Type: Bug
            Reporter: Kevin Wilfong
            Assignee: Kevin Wilfong


After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

E.g.
FROM src
INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

Posted by "Amareshwari Sriramadasu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-2750:
------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0
           Status: Resolved  (was: Patch Available)

Seems the issue missed resolution. Resolving.

                
> Hive multi group by single reducer optimization causes invalid column reference error
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-2750
>                 URL: https://issues.apache.org/jira/browse/HIVE-2750
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
> This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193949#comment-13193949 ] 

Hudson commented on HIVE-2750:
------------------------------

Integrated in Hive-trunk-h0.21 #1222 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1222/])
    HIVE-2750 Hive multi group by single reducer optimization causes invalid column
reference error (Kevin Wilfong via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236150
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out

                
> Hive multi group by single reducer optimization causes invalid column reference error
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-2750
>                 URL: https://issues.apache.org/jira/browse/HIVE-2750
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
> This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

Posted by "Namit Jain (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193555#comment-13193555 ] 

Namit Jain commented on HIVE-2750:
----------------------------------

+1
                
> Hive multi group by single reducer optimization causes invalid column reference error
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-2750
>                 URL: https://issues.apache.org/jira/browse/HIVE-2750
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
> This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

Posted by "Kevin Wilfong (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wilfong updated HIVE-2750:
--------------------------------

    Status: Patch Available  (was: Open)
    
> Hive multi group by single reducer optimization causes invalid column reference error
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-2750
>                 URL: https://issues.apache.org/jira/browse/HIVE-2750
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
> This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2750:
------------------------------

    Attachment: HIVE-2750.D1455.1.patch

kevinwilfong requested code review of "HIVE-2750 [jira] Hive multi group by single reducer optimization causes invalid column reference error".
Reviewers: JIRA

  When generating the list of value columns for the reduce sink operator, in the case of multiple group bys occurring in the same reducer, only the columns used by the first query block was being considered, due to a typo.  This patch fixes this typo, and adds a testcase to ensure the error does not reoccur.

  After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.

  E.g.
  FROM src
  INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
  INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);

  This results in an invalid column reference error on src.value

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D1455

AFFECTED FILES
  ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
  ql/src/test/queries/clientpositive/groupby_multi_single_reducer2.q
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3015/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Hive multi group by single reducer optimization causes invalid column reference error
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-2750
>                 URL: https://issues.apache.org/jira/browse/HIVE-2750
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and the same group by keys, but the first query block does not reference all the rows the second query block does, an invalid column reference error is raised for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY substr(src.key,1,1);
> This results in an invalid column reference error on src.value

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira