You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2014/09/12 07:56:33 UTC

[jira] [Updated] (HIVE-8062) Stats collection for columns fails on a partitioned table with null values in partitioning column

     [ https://issues.apache.org/jira/browse/HIVE-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated HIVE-8062:
-----------------------------------
    Attachment: HIVE-8062.patch

> Stats collection for columns fails on a partitioned table with null values in partitioning column
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8062
>                 URL: https://issues.apache.org/jira/browse/HIVE-8062
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.14.0
>            Reporter: Deepesh Khandelwal
>         Attachments: HIVE-8062.patch
>
>
> Steps to reproduce:
> 1. Create a data file abc.txt with the following contents:
> {noformat}
> a,1
> b,
> {noformat}
> 2. Use the Hive CLI to create and load the partitioned table:
> {noformat}
> hive> create table abc(a string, b int);
> OK
> Time taken: 0.272 seconds
> hive> load data local inpath 'abc.txt' into table abc;
> Loading data to table default.abc
> Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
> OK
> Time taken: 0.463 seconds
> hive> create table abc1(a string) partitioned by (b int);
> OK
> Time taken: 0.098 seconds
> hive> set hive.exec.dynamic.partition.mode=nonstrict;
> hive> insert overwrite table abc1 partition (b) select a, b from abc;
> Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063)
> Map 1: -/-	Reducer 2: 0/1
> Map 1: 0/1	Reducer 2: 0/1
> Map 1: 0(+1)/1	Reducer 2: 0/1
> Map 1: 1/1	Reducer 2: 0(+1)/1
> Map 1: 1/1	Reducer 2: 0/1
> Map 1: 1/1	Reducer 2: 1/1
> Status: Finished successfully
> Loading data to table default.abc1 partition (b=null)
> 	Loading partition {b=__HIVE_DEFAULT_PARTITION__}
> Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, numRows=2, totalSize=7, rawDataSize=5]
> OK
> Time taken: 7.49 seconds
> {noformat}
> 3. Now run the analyze statistics command for columns:
> {noformat}
> hive> analyze table abc1 partition (b) compute statistics for columns;
> Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (application id: Executing on YARN cluster with App id application_1410457588978_0063)
> Map 1: 0(+1)/1	Reducer 2: 0/1
> Map 1: 1/1	Reducer 2: 0(+1)/1
> Map 1: 1/1	Reducer 2: 1/1
> Status: Finished successfully
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
> {noformat}
> The analyze statistics for columns fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)