You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2018/04/05 01:09:00 UTC
[jira] [Resolved] (HIVE-16919) Vectorization:
vectorization_short_regress.q has query result differences with
non-vectorized run. Vectorized unary function broken?
[ https://issues.apache.org/jira/browse/HIVE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline resolved HIVE-16919.
---------------------------------
Resolution: Fixed
> Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run. Vectorized unary function broken?
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-16919
> URL: https://issues.apache.org/jira/browse/HIVE-16919
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Matt McCline
> Assignee: Matt McCline
> Priority: Critical
>
> Jason spotted a difference in the query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created, there are 2 differences.
> They both seem to be related to negation. For example, in the first one MAX(cint) and MAX(cint) appear earlier as columns and match non-vec and vec. So, it doesn't appear that aggregation is failing. It seems like the issue is now that the Reducer is vectorizing, a bug is exposed. So, even though MAX and MIN are the same, the expression with negation returns different results.
> 19th field of the query below: Vectorized 511 vs Non-Vectorized -58
> {noformat}
> SELECT MAX(cint),
> (MAX(cint) / -3728),
> (MAX(cint) * -3728),
> VAR_POP(cbigint),
> (-((MAX(cint) * -3728))),
> STDDEV_POP(csmallint),
> (-563 % (MAX(cint) * -3728)),
> (VAR_POP(cbigint) / STDDEV_POP(csmallint)),
> (-(STDDEV_POP(csmallint))),
> MAX(cdouble),
> AVG(ctinyint),
> (STDDEV_POP(csmallint) - 10.175),
> MIN(cint),
> ((MAX(cint) * -3728) % (STDDEV_POP(csmallint) - 10.175)),
> (-(MAX(cdouble))),
> MIN(cdouble),
> (MAX(cdouble) % -26.28),
> STDDEV_SAMP(csmallint),
> (-((MAX(cint) / -3728))),
> ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
> ((MAX(cint) / -3728) - AVG(ctinyint)),
> (-((MAX(cint) * -3728))),
> VAR_SAMP(cint)
> FROM alltypesorc
> WHERE (((cbigint <= 197)
> AND (cint < cbigint))
> OR ((cdouble >= -26.28)
> AND (csmallint > cdouble))
> OR ((ctinyint > cfloat)
> AND (cstring1 RLIKE '.*ss.*'))
> OR ((cfloat > 79.553)
> AND (cstring2 LIKE '10%')))
> {noformat}
> Column expression is: ((-((MAX(cint) * -3728))) % (-563 % (MAX(cint) * -3728))),
> -----------------------------------------------
> This is a previously existing issue and now filed as HIVE-16919: "Vectorization: vectorization_short_regress.q has query result differences with non-vectorized run"
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs. -Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> Query result for vectorization_short_regress.q.out -- that is when vectorization is turned off and a base .q.out file created.
> -----------------------------------------------
> 10th field of the query below: Non-Vectorized -6432.000015344526 vs. Vectorized -6432.0
> Column expression is (-(cdouble)) as c4,
> {noformat}
> SELECT ctimestamp1,
> cstring2,
> cdouble,
> cfloat,
> cbigint,
> csmallint,
> (cbigint / 3569) as c1,
> (-257 - csmallint) as c2,
> (-6432 * cfloat) as c3,
> (-(cdouble)) as c4,
> (cdouble * 10.175) as c5,
> ((-6432 * cfloat) / cfloat) as c6,
> (-(cfloat)) as c7,
> (cint % csmallint) as c8,
> (-(cdouble)) as c9,
> (cdouble * (-(cdouble))) as c10
> FROM alltypesorc
> WHERE (((-1.389 >= cint)
> AND ((csmallint < ctinyint)
> AND (-6432 > csmallint)))
> OR ((cdouble >= cfloat)
> AND (cstring2 <= 'a'))
> OR ((cstring1 LIKE 'ss%')
> AND (10.175 > cbigint)))
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)