You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2017/06/20 06:57:00 UTC

[jira] [Comment Edited] (HIVE-16589) Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE for AVG, VARIANCE

    [ https://issues.apache.org/jira/browse/HIVE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047406#comment-16047406 ] 

Matt McCline edited comment on HIVE-16589 at 6/20/17 6:56 AM:
--------------------------------------------------------------

Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created, there are 2 differences:

19th field of the query below: Vectorized 511 vs Non-Vectorized -58

{noformat}
SELECT MAX(cint),
       (MAX(cint) / -3728),
       (MAX(cint) * -3728),
       VAR_POP(cbigint),
       (-((MAX(cint) * -3728))),
       STDDEV_POP(csmallint),
       (-563 % (MAX(cint) * -3728)),
       (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
       (-(STDDEV_POP(csmallint))),
       MAX(cdouble),
       AVG(ctinyint),
       (STDDEV_POP(csmallint) - 10.175),
       MIN(cint),
       ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
       (-(MAX(cdouble))),
       MIN(cdouble),
       (MAX(cdouble) % -26.28),
       STDDEV_SAMP(csmallint),
       (-((MAX(cint) / -3728))),
       ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
       ((MAX(cint) / -3728) - AVG(ctinyint)),
       (-((MAX(cint) * -3728))),
       VAR_SAMP(cint)
FROM   alltypesorc
WHERE  (((cbigint <= 197)
         AND (cint < cbigint))
        OR ((cdouble >= -26.28)
            AND (csmallint > cdouble))
        OR ((ctinyint > cfloat)
            AND (cstring1 RLIKE '.*ss.*'))
           OR ((cfloat > 79.553)
               AND (cstring2 LIKE '10%')))
{noformat}

Column expression is:  (-((MAX(cint) / -3728))),

-----------------------------------------------

This is a previously existing issue and now filed as  HIVE-16919: "Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run"
10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
         cstring2,
         cdouble,
         cfloat,
         cbigint,
         csmallint,
         (cbigint / 3569) as c1,
         (-257 - csmallint) as c2,
         (-6432 * cfloat) as c3,
         (-(cdouble)) as c4,
         (cdouble * 10.175) as c5,
         ((-6432 * cfloat) / cfloat) as c6,
         (-(cfloat)) as c7,
         (cint % csmallint) as c8,
         (-(cdouble)) as c9,
         (cdouble * (-(cdouble))) as c10
FROM     alltypesorc
WHERE    (((-1.389 >= cint)
           AND ((csmallint < ctinyint)
                AND (-6432 > csmallint)))
          OR ((cdouble >= cfloat)
              AND (cstring2 <= 'a'))
             OR ((cstring1 LIKE 'ss%')
                 AND (10.175 > cbigint)))
{noformat}


was (Author: mmccline):

Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created, there are 2 differences:

19th field of the query below: Vectorized 511 vs Non-Vectorized -58

{noformat}
SELECT MAX(cint),
       (MAX(cint) / -3728),
       (MAX(cint) * -3728),
       VAR_POP(cbigint),
       (-((MAX(cint) * -3728))),
       STDDEV_POP(csmallint),
       (-563 % (MAX(cint) * -3728)),
       (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
       (-(STDDEV_POP(csmallint))),
       MAX(cdouble),
       AVG(ctinyint),
       (STDDEV_POP(csmallint) - 10.175),
       MIN(cint),
       ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
       (-(MAX(cdouble))),
       MIN(cdouble),
       (MAX(cdouble) % -26.28),
       STDDEV_SAMP(csmallint),
       (-((MAX(cint) / -3728))),
       ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
       ((MAX(cint) / -3728) - AVG(ctinyint)),
       (-((MAX(cint) * -3728))),
       VAR_SAMP(cint)
FROM   alltypesorc
WHERE  (((cbigint <= 197)
         AND (cint < cbigint))
        OR ((cdouble >= -26.28)
            AND (csmallint > cdouble))
        OR ((ctinyint > cfloat)
            AND (cstring1 RLIKE '.*ss.*'))
           OR ((cfloat > 79.553)
               AND (cstring2 LIKE '10%')))
{noformat}

Column expression is:  (-((MAX(cint) / -3728))),

-----------------------------------------------

10th field of the query below: Vectorized -6432.000015344526 vs. Non-Vectorized -6432.0

Column expression is (-(cdouble)) as c4,

{noformat}
SELECT   ctimestamp1,
         cstring2,
         cdouble,
         cfloat,
         cbigint,
         csmallint,
         (cbigint / 3569) as c1,
         (-257 - csmallint) as c2,
         (-6432 * cfloat) as c3,
         (-(cdouble)) as c4,
         (cdouble * 10.175) as c5,
         ((-6432 * cfloat) / cfloat) as c6,
         (-(cfloat)) as c7,
         (cint % csmallint) as c8,
         (-(cdouble)) as c9,
         (cdouble * (-(cdouble))) as c10
FROM     alltypesorc
WHERE    (((-1.389 >= cint)
           AND ((csmallint < ctinyint)
                AND (-6432 > csmallint)))
          OR ((cdouble >= cfloat)
              AND (cstring2 <= 'a'))
             OR ((cstring1 LIKE 'ss%')
                 AND (10.175 > cbigint)))
{noformat}

> Vectorization: Support Complex Types and GroupBy modes PARTIAL2, FINAL, and COMPLETE  for AVG, VARIANCE
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16589
>                 URL: https://issues.apache.org/jira/browse/HIVE-16589
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-16589.01.patch, HIVE-16589.02.patch, HIVE-16589.03.patch, HIVE-16589.04.patch, HIVE-16589.05.patch, HIVE-16589.06.patch, HIVE-16589.07.patch, HIVE-16589.08.patch, HIVE-16589.091.patch, HIVE-16589.092.patch, HIVE-16589.093.patch, HIVE-16589.094.patch, HIVE-16589.095.patch, HIVE-16589.096.patch, HIVE-16589.097.patch, HIVE-16589.098.patch, HIVE-16589.0991.patch, HIVE-16589.099.patch, HIVE-16589.09.patch
>
>
> Allow Complex Types to be vectorized (since HIVE-16207: "Add support for Complex Types in Fast SerDe" was committed).
> Add more classes we vectorize AVG in preparation for fully supporting AVG GroupBy.  In particular, the PARTIAL2 and FINAL groupby modes that take in the AVG struct as input.  And, add the COMPLETE mode that takes in the Original data and produces the Full Aggregation for completeness, so to speak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)