You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2013/11/16 03:41:21 UTC

[jira] [Updated] (HIVE-5817) column name to index mapping in VectorizationContext is broken

     [ https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HIVE-5817:
-----------------------------------

    Attachment: HIVE-5817-uniquecols.broken.patch
                HIVE-5817.00-broken.patch

Here are 2 patches. Unique cols is abandoned patch with solution 1. 
the main one is for (2), I was able to compile queries properly but running fails. It's probably something related to passing of serialized stuff thru scratch map, but I cannot get MR logging to work in tests so it's hard to tell.

If someone wants to take over either patch, feel free... I will get back to it the monday after next earliest.

> column name to index mapping in VectorizationContext is broken
> --------------------------------------------------------------
>
>                 Key: HIVE-5817
>                 URL: https://issues.apache.org/jira/browse/HIVE-5817
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: HIVE-5817-uniquecols.broken.patch, HIVE-5817.00-broken.patch
>
>
> Columns coming from different operators may have the same internal names ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs ok w/o vectorization. With vectorization, it will run ok for most ca, but for some ca it will fail (or can probably return incorrect results). That is because when building column-to-VRG-index map in VectorizationContext, internal column name for ca that the first map join operator adds to the mapping may be the same as internal name for cb that the 2nd one tries to add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to output stuff, it retrieves wrong index from the map by name, and then wrong vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)