You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2013/02/03 23:22:12 UTC

[jira] [Commented] (HIVE-3962) Number of distinct values are wrong in column statistics

    [ https://issues.apache.org/jira/browse/HIVE-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569917#comment-13569917 ] 

Ashutosh Chauhan commented on HIVE-3962:
----------------------------------------

I am not sure if we intend to make statistics absolutely correct. These stats are approximate and are only meant for optimizer to use for better query planning. They are not meant to use for actual query processing itself. [~shreepadma] Is that correct? If so, shall we mark this as "won't fix" ? 
                
> Number of distinct values are wrong in column statistics
> --------------------------------------------------------
>
>                 Key: HIVE-3962
>                 URL: https://issues.apache.org/jira/browse/HIVE-3962
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.10.0
>            Reporter: Amareshwari Sriramadasu
>
> When we run the query on hive ql src table :
> select count(distinct(key)), count(distinct(value) from src;
> 309 309
> After running the following analyze query, the stats in metastore seem wrong:
> analyze table src compute statistics for columns key, value; 
> --- stats in metastore ---
> mysql > select * from TAB_COL_STATS where TABLE_NAME="src";
> | CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID | LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE | BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS | AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED |
> |     5 | default | src        | key         | int         |     11 |              0 |             498 |            0.0000 |           0.0000 | NULL                  | NULL                   |         0 |           291 |      0.0000 |           0 |         0 |          0 |    1359539181 |
> |     6 | default | src        | value       | string      |     11 |              0 |               0 |            0.0000 |           0.0000 | NULL                  | NULL                   |         0 |           112 |      6.8120 |           7 |         0 |          0 |    1359539181 |

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira