You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/02 21:58:00 UTC

[jira] [Work logged] (HIVE-22248) Min value for column in stats is not set correctly for some data types

     [ https://issues.apache.org/jira/browse/HIVE-22248?focusedWorklogId=322299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322299 ]

ASF GitHub Bot logged work on HIVE-22248:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Oct/19 21:57
            Start Date: 02/Oct/19 21:57
    Worklog Time Spent: 10m 
      Work Description: miklosgergely commented on pull request #801: HIVE-22248 Fix statistics persisting issues
URL: https://github.com/apache/hive/pull/801
 
 
   - During the thrift call the XXXXColumnStatsDataInspector was transformed into a XXXXColumnStatsData object, which then was converted back, by calling the xxxxInspectorFromStats functions. The new object was never put back though to the aggregateStats, so all the modifications made by the XXXXColumnStatsMerger was made on an object that was never used again. Added aggregateColStats.getStatsData().setXXXXStats(aggregateData); calls to put them there, so the changes made by the merger are actually in effect.
   
   - The min value was miscalculated for Long and Double types, as the null value was treated as 0. It was fixed by calculating the min values by also using the isSetLowValue() function.
   
   - In case of vector_coalesce_3.q the bad statistics made the engine "think" that the column is a primary key following some heuristics based on statistics, and made it guess the statistics in a different way, thus is the output change.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 322299)
    Remaining Estimate: 0h
            Time Spent: 10m

> Min value for column in stats is not set correctly for some data types
> ----------------------------------------------------------------------
>
>                 Key: HIVE-22248
>                 URL: https://issues.apache.org/jira/browse/HIVE-22248
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Miklos Gergely
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22248.01.patch, HIVE-22248.02.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am not sure whether the problem is printing the value or in the value stored in the metastore itself, but for some types (e.g. tinyint, smallint, int, bigint, double or float), the min value does not seem to be set correctly (set to 0).
> https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/alter_table_update_status.q.out#L342



--
This message was sent by Atlassian Jira
(v8.3.4#803005)