You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Navdeep Agrawal <Na...@symantec.com> on 2014/07/22 19:14:16 UTC

Hive Stats

i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using


set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))


RE: Hive Stats

Posted by Navdeep Agrawal <Na...@symantec.com>.
Hi ,
I am unable to see COLUMN_STATS_ACCURATE in desc formatted/extended ,but I think u r pointing to stats.reliable I have made that to false  .

From: Bala Krishna Gangisetty [mailto:bala@altiscale.com]
Sent: Wednesday, July 23, 2014 12:46 AM
To: user@hive.apache.org
Subject: Re: Hive Stats

What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED <table_name>" output under "Table Parameters" section. My suspect is, it's false even after executing your analyze query. If so, Hive launches the mapred jobs.

Try using below analyze query:

analyze table mytable partition(server_date=’2013-11-30′) compute statistics;

--Bala G.

On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal <Na...@symantec.com>> wrote:

i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))



RE: Hive Stats

Posted by Navdeep Agrawal <Na...@symantec.com>.
i digged into partitions params and checked out column_stats_accurate in metastore but found it to be true .

From: Bala Krishna Gangisetty [mailto:bala@altiscale.com]
Sent: Wednesday, July 23, 2014 12:46 AM
To: user@hive.apache.org
Subject: Re: Hive Stats

What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED <table_name>" output under "Table Parameters" section. My suspect is, it's false even after executing your analyze query. If so, Hive launches the mapred jobs.

Try using below analyze query:

analyze table mytable partition(server_date=’2013-11-30′) compute statistics;

--Bala G.

On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal <Na...@symantec.com>> wrote:

i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch these statistics .

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats to get correct result(through map reduce max(load_inst_id))



Re: Hive Stats

Posted by Bala Krishna Gangisetty <ba...@altiscale.com>.
What is the value of "COLUMN_STATS_ACCURATE" field in "DESCRIBE FORMATTED
<table_name>" output under "Table Parameters" section. My suspect is, it's
false even after executing your analyze query. If so, Hive launches the
mapred jobs.

Try using below analyze query:

analyze table mytable partition(server_date=’2013-11-30′) compute
statistics;

--Bala G.


On Tue, Jul 22, 2014 at 10:14 AM, Navdeep Agrawal <
Navdeep_Agrawal@symantec.com> wrote:

> i am trying to compute statistics on ORC File but i am unable see any
> changes in PART_COL_STATS as well on using
>
> set hive.compute.query.using.stats=true;
> set hive.stats.reliable=true;
> set hive.stats.fetch.column.stats=true;
> set hive.stats.fetch.partition.stats=true;
> set hive.cbo.enable=true;
>
> to get max value of a column it is running full Map reduce on column ..
> what i want to use is max value stored in meta store ,but i am unable to
> catch these statistics .
>
> my table desc is
> load_inst_id int
> src_filename string
> server_date date
>
> my analyze query is
> analyze table mytable partition(server_date=’2013-11-30′) compute
> statistics for columns load_inst_id;
>
> i am always getting 0 as loadinstant id ,i have to turn off my
> hive.compute.query.using.stats to get correct result(through map reduce
> max(load_inst_id))
>
>
>