You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Hiroyuki Yamada <mo...@gmail.com> on 2012/08/29 08:27:09 UTC

num_rows is always 0 in statistics

Hi,

I have run "analyse table" command several times to get statistics,
but I always get num_rows=0 like below.
(also, raw_data_size is 0)

-----
hive> analyze table lineitem compute statistics;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201208291425_0011, Tracking URL =
http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
-Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 0
2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
Ended Job = job_201208291425_0011
Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
0, total_size: 759863287, raw_data_size: 0]
-----

I tried the version 0.7.1, 0.8.1, 0.9.0 and
the same result.
Is there anything else I have to do to make it work ?

Also, is statistics only works for managed tables ?
I tried it for external tables and it doesn't seem working. (all the
values are 0 )

Thanks,

Hiroyuki

Re: num_rows is always 0 in statistics

Posted by Hiroyuki Yamada <mo...@gmail.com>.
Hi,

Sorry, it works now. Thank you.
But, the value is not correct. (about half of real number of rows.)
Is this sampled value ?
It seems counting every row as far as i checked TableScanOperator.java .

Thanks,

Hiroyuki

On Wed, Aug 29, 2012 at 5:39 PM, Hiroyuki Yamada <mo...@gmail.com> wrote:
> Hi,
>
> Thank you for the reply.
> I tried with the following setting, but I got the same result. (with num_rows=0)
>
> hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true
>
> Is there any clue ?
>
> On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma <ro...@huawei.com> wrote:
>> I resolved the issue with following way.
>>
>> Configure
>> "hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore".
>> This works only in single node cluster.
>>
>>
>> Please check HIVE-3324.
>>
>>
>> -----Original Message-----
>> From: Hiroyuki Yamada [mailto:mogwaing@gmail.com]
>> Sent: Wednesday, August 29, 2012 11:57 AM
>> To: user@hive.apache.org
>> Subject: num_rows is always 0 in statistics
>>
>> Hi,
>>
>> I have run "analyse table" command several times to get statistics,
>> but I always get num_rows=0 like below.
>> (also, raw_data_size is 0)
>>
>> -----
>> hive> analyze table lineitem compute statistics;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> Starting Job = job_201208291425_0011, Tracking URL =
>> http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
>> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
>> -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
>> Hadoop job information for Stage-0: number of mappers: 3; number of
>> reducers: 0
>> 2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
>> 2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
>> 2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
>> Ended Job = job_201208291425_0011
>> Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
>> 0, total_size: 759863287, raw_data_size: 0]
>> -----
>>
>> I tried the version 0.7.1, 0.8.1, 0.9.0 and
>> the same result.
>> Is there anything else I have to do to make it work ?
>>
>> Also, is statistics only works for managed tables ?
>> I tried it for external tables and it doesn't seem working. (all the
>> values are 0 )
>>
>> Thanks,
>>
>> Hiroyuki
>>

Re: num_rows is always 0 in statistics

Posted by Hiroyuki Yamada <mo...@gmail.com>.
Hi,

Thank you for the reply.
I tried with the following setting, but I got the same result. (with num_rows=0)

hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/tmp/TempStatsStore;create=true

Is there any clue ?

On Wed, Aug 29, 2012 at 4:09 PM, rohithsharma <ro...@huawei.com> wrote:
> I resolved the issue with following way.
>
> Configure
> "hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore".
> This works only in single node cluster.
>
>
> Please check HIVE-3324.
>
>
> -----Original Message-----
> From: Hiroyuki Yamada [mailto:mogwaing@gmail.com]
> Sent: Wednesday, August 29, 2012 11:57 AM
> To: user@hive.apache.org
> Subject: num_rows is always 0 in statistics
>
> Hi,
>
> I have run "analyse table" command several times to get statistics,
> but I always get num_rows=0 like below.
> (also, raw_data_size is 0)
>
> -----
> hive> analyze table lineitem compute statistics;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201208291425_0011, Tracking URL =
> http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
> Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
> -Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
> Hadoop job information for Stage-0: number of mappers: 3; number of
> reducers: 0
> 2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
> 2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
> 2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
> Ended Job = job_201208291425_0011
> Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
> 0, total_size: 759863287, raw_data_size: 0]
> -----
>
> I tried the version 0.7.1, 0.8.1, 0.9.0 and
> the same result.
> Is there anything else I have to do to make it work ?
>
> Also, is statistics only works for managed tables ?
> I tried it for external tables and it doesn't seem working. (all the
> values are 0 )
>
> Thanks,
>
> Hiroyuki
>

RE: num_rows is always 0 in statistics

Posted by rohithsharma <ro...@huawei.com>.
I resolved the issue with following way.

Configure
"hive.stats.dbconnectionstring=jdbc:derby:;databaseName=/home/TempStore".
This works only in single node cluster.


Please check HIVE-3324. 


-----Original Message-----
From: Hiroyuki Yamada [mailto:mogwaing@gmail.com] 
Sent: Wednesday, August 29, 2012 11:57 AM
To: user@hive.apache.org
Subject: num_rows is always 0 in statistics

Hi,

I have run "analyse table" command several times to get statistics,
but I always get num_rows=0 like below.
(also, raw_data_size is 0)

-----
hive> analyze table lineitem compute statistics;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201208291425_0011, Tracking URL =
http://hadoop-node1:50030/jobdetails.jsp?jobid=job_201208291425_0011
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job
-Dmapred.job.tracker=hadoop-node1:8021 -kill job_201208291425_0011
Hadoop job information for Stage-0: number of mappers: 3; number of
reducers: 0
2012-08-29 15:16:16,133 Stage-0 map = 0%,  reduce = 0%
2012-08-29 15:16:20,154 Stage-0 map = 100%,  reduce = 0%
2012-08-29 15:16:22,168 Stage-0 map = 100%,  reduce = 100%
Ended Job = job_201208291425_0011
Table sf1.lineitem stats: [num_partitions: 0, num_files: 1, num_rows:
0, total_size: 759863287, raw_data_size: 0]
-----

I tried the version 0.7.1, 0.8.1, 0.9.0 and
the same result.
Is there anything else I have to do to make it work ?

Also, is statistics only works for managed tables ?
I tried it for external tables and it doesn't seem working. (all the
values are 0 )

Thanks,

Hiroyuki