You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/08/23 19:27:00 UTC

[jira] [Resolved] (IMPALA-1988) show column stats returns different results for beeswax and hs2

     [ https://issues.apache.org/jira/browse/IMPALA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-1988.
-----------------------------------
    Resolution: Duplicate

> show column stats returns different results for beeswax and hs2
> ---------------------------------------------------------------
>
>                 Key: IMPALA-1988
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1988
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Clients
>    Affects Versions: Impala 2.2
>            Reporter: Alex Leblang
>              Labels: compute-stats, hs2
>
> In the impala shell show column stats compute_stats_db.alltypes functions as expected; stats are returned. When the same command is executed through impyla using hs2, the second to last column, Max Size, is always of None type.
> To reproduce:
> in the Impala shell
> [localhost:21000] > show column stats compute_stats_db.alltypes;
> Query: show column stats compute_stats_db.alltypes
> +-----------------+-----------+------------------+--------+----------+----------+
> | Column          | Type      | #Distinct Values | #Nulls | Max Size | Avg Size |
> +-----------------+-----------+------------------+--------+----------+----------+
> | id              | INT       | 8161             | -1     | 4        | 4        |
> | bool_col        | BOOLEAN   | 2                | -1     | 1        | 1        |
> | tinyint_col     | TINYINT   | 10               | -1     | 1        | 1        |
> | smallint_col    | SMALLINT  | 10               | -1     | 2        | 2        |
> | int_col         | INT       | 10               | -1     | 4        | 4        |
> | bigint_col      | BIGINT    | 10               | -1     | 8        | 8        |
> | float_col       | FLOAT     | 10               | -1     | 4        | 4        |
> | double_col      | DOUBLE    | 10               | -1     | 8        | 8        |
> | date_string_col | STRING    | 666              | -1     | 8        | 8        |
> | string_col      | STRING    | 10               | -1     | 1        | 1        |
> | timestamp_col   | TIMESTAMP | 5678             | -1     | 16       | 16       |
> | year            | INT       | 2                | 0      | 4        | 4        |
> | month           | INT       | 12               | 0      | 4        | 4        |
> +-----------------+-----------+------------------+--------+----------+----------+
> Fetched 13 row(s) in 0.01s
> In ipython (normal python also works fine for this):
> In [1]: from impala.dbapi import connect
> In [2]: conn = connect()
> In [3]: cur = conn.cursor()
> In [4]: cur.execute("show column stats compute_stats_db.alltypes")
> In [5]: cur.fetchall()
> Out[5]: 
> [('id', 'INT', 8161, -1, None, 4.0),
>  ('bool_col', 'BOOLEAN', 2, -1, None, 1.0),
>  ('tinyint_col', 'TINYINT', 10, -1, None, 1.0),
>  ('smallint_col', 'SMALLINT', 10, -1, None, 2.0),
>  ('int_col', 'INT', 10, -1, None, 4.0),
>  ('bigint_col', 'BIGINT', 10, -1, None, 8.0),
>  ('float_col', 'FLOAT', 10, -1, None, 4.0),
>  ('double_col', 'DOUBLE', 10, -1, None, 8.0),
>  ('date_string_col', 'STRING', 666, -1, None, 8.0),
>  ('string_col', 'STRING', 10, -1, None, 1.0),
>  ('timestamp_col', 'TIMESTAMP', 5678, -1, None, 16.0),
>  ('year', 'INT', 2, 0, None, 4.0),
>  ('month', 'INT', 12, 0, None, 4.0)]
> In [6]: cur.description
> Out[6]: 
> [('Column', 'STRING', None, None, None, None, None),
>  ('Type', 'STRING', None, None, None, None, None),
>  ('#Distinct Values', 'BIGINT', None, None, None, None, None),
>  ('#Nulls', 'BIGINT', None, None, None, None, None),
>  ('Max Size', 'INT', None, None, None, None, None),
>  ('Avg Size', 'DOUBLE', None, None, None, None, None)]
> For those unfamiliar with impyla:
> fetchall() return the query results; each tuple is a row.
> description returns the column labels and types, e.g. the first column is named Column and is of type string.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)