You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2017/02/28 00:00:48 UTC

[jira] [Comment Edited] (HIVE-16040) union column expansion should take aliases from the leftmost branch

    [ https://issues.apache.org/jira/browse/HIVE-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886799#comment-15886799 ] 

Sergey Shelukhin edited comment on HIVE-16040 at 2/28/17 12:00 AM:
-------------------------------------------------------------------

[~ashutoshc] see the TODO that this patch removes.
The way Hive structures the unions/etc w/more than 2 sides is an unbalanced binary tree sortof, where the last query of the union is the right child, and all the others are the left branch, and so on recursively.
E.g.
{noformat}
TOK_UNION
  TOK_UNION
      TOK_QUERY
      TOK_QUERY
  TOK_QUERY
{noformat}
  However, that means that the first query of the union, that Hive uses to get column aliases, is not the one that the original patch was looking at - it was looking at the least nested select, which would be the rightmost (the last) query of the union.

So what we do is find the parent of the tree and, if we are not the left-most child, go into the left (first) sub-tree and find select again, then repeat. For a 2-query union, it will find the correct select immediately.
It's a little bit wasteful (because in multi-union we'd always find the wrong select, then backtrack to find the left side, then find the wrong select inside the left side, then backtrack again one level lower, etc. until we get to the level where both children are on the same level) but it should protect against finding selects in unexpected places like subquery expressions, etc.


was (Author: sershe):
[~ashutoshc] see the TODO that this patch removes.
The way Hive structures the unions/etc w/more than 3 sides is an unbalanced binary tree sortof, where the last query of the union is the right child, and all the others are the left branch, and so on recursively.
E.g.
{noformat}
TOK_UNION
  TOK_UNION
      TOK_QUERY
      TOK_QUERY
  TOK_QUERY
{noformat}
  However, that means that the first query of the union, that Hive uses to get column aliases, is not the one that the original patch was looking at - it was looking at the least nested select, which would be the rightmost (the last) query of the union.

So what we do is find the parent of the tree and, if we are not the left-most child, go into the left (first) sub-tree and find select again, then repeat. For a 2-query union, it will find the correct select immediately.
It's a little bit wasteful (because in multi-union we'd always find the wrong select, then backtrack to find the left side, then find the wrong select inside the left side, then backtrack again one level lower, etc. until we get to the level where both children are on the same level) but it should protect against finding selects in unexpected places like subquery expressions, etc.

> union column expansion should take aliases from the leftmost branch
> -------------------------------------------------------------------
>
>                 Key: HIVE-16040
>                 URL: https://issues.apache.org/jira/browse/HIVE-16040
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-16040.01.patch, HIVE-16040.02.patch, HIVE-16040.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)