You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Suma Shivaprasad <su...@gmail.com> on 2014/07/24 09:02:42 UTC
Column Stats with parquet
I am trying to enable Column statistics usage with Parquet tables. This is
the query I am executing. However on explain, I see that even though *Basic
stats: COMPLETE *is seen *Column stats *is seen as* NONE.*
Can someone please explain what else I need to debug/fix this.
set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;
analyze table user_table partition(dt='2014-06-01',hour='00') compute
statistics;
explain select min(a), max(b), min(c) from user_table;
hive> explain select min(a), max(b), min(c) from usertable;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: user_table
Statistics: Num rows: 55490383 Data size: 1831182639 *Basic
stats: COMPLETE Column stats: NONE*
Select Operator
expressions: a (type: double), b (type: double), c (type: int)
outputColumnNames: a, b, c
Statistics: Num rows: 55490383 Data size: 1831182639* Basic
stats: COMPLETE Column stats: NONE*
Group By Operator
aggregations: min(a), max(b), min(c)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
value expressions: _col0 (type: double), _col1 (type:
double), _col2 (type: int)
Reduce Operator Tree:
Group By Operator
aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: _col0 (type: double), _col1 (type: double), _col2
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Thanks
Fwd: Column Stats with parquet
Posted by Suma Shivaprasad <su...@gmail.com>.
I am trying to enable Column statistics usage with Parquet tables. This is
the query I am executing. However on explain, I see that even though *Basic
stats: COMPLETE *is seen *Column stats *is seen as* NONE.*
Can someone please explain what else I need to debug/fix this.
set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;
analyze table user_table partition(dt='2014-06-01',hour='00') compute
statistics;
explain select min(a), max(b), min(c) from user_table;
hive> explain select min(a), max(b), min(c) from usertable;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: user_table
Statistics: Num rows: 55490383 Data size: 1831182639 *Basic
stats: COMPLETE Column stats: NONE*
Select Operator
expressions: a (type: double), b (type: double), c (type: int)
outputColumnNames: a, b, c
Statistics: Num rows: 55490383 Data size: 1831182639* Basic
stats: COMPLETE Column stats: NONE*
Group By Operator
aggregations: min(a), max(b), min(c)
mode: hash
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 20 *Basic stats:
COMPLETE Column stats: NONE*
value expressions: _col0 (type: double), _col1 (type:
double), _col2 (type: int)
Reduce Operator Tree:
Group By Operator
aggregations: min(VALUE._col0), max(VALUE._col1), min(VALUE._col2)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: _col0 (type: double), _col1 (type: double), _col2
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Thanks