You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 徐 鹏 <xu...@outlook.com> on 2016/12/22 07:48:52 UTC

hive2.1.0 one partition has two locations

Hi all:


HQL:

Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR=2016,MONTH=12,DAY=04)
LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';

INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR=2016,MONTH=12,DAY=04)
SELECT
    '2016-12-04',
    EventID,
    PlatID,
    ProvinceID,
    CityID,
    RefID,
    '',
    '',
    '',
    COUNT(1),
    COUNT(DISTINCT DeviceID)
FROM TCRecSys_ApplyData.OperatingStat_D_EventList
WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;

expected result:

  *   partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’

actual result:

  *   partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'
  *   partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4'

desc format info:

hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=04);
OK
# col_name              data_type               comment

createdate              string
eventid                 string
platid                  string
provinceid              string
cityid                  string
refid                   string
def1                    string
def2                    string
def3                    string
pv                      int
uv                      int

# Partition Information
# col_name              data_type               comment

year                    int
month                   int
day                     int

# Detailed Partition Information
Partition Value:        [2016, 12, 04]
Database:               tcrecsys_applydata
Table:                  operatingstat_r_view
CreateTime:             Mon Dec 05 10:46:27 CST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04
Partition Parameters:
        COLUMN_STATS_ACCURATE   false
        numFiles                0
        numRows                 -1
        rawDataSize             -1
        totalSize               0
        transient_lastDdlTime   1480905987

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ^
        serialization.format    ^
Time taken: 0.485 seconds, Fetched: 48 row(s)
hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=4);
OK
# col_name              data_type               comment

createdate              string
eventid                 string
platid                  string
provinceid              string
cityid                  string
refid                   string
def1                    string
def2                    string
def3                    string
pv                      int
uv                      int

# Partition Information
# col_name              data_type               comment

year                    int
month                   int
day                     int

# Detailed Partition Information
Partition Value:        [2016, 12, 4]
Database:               tcrecsys_applydata
Table:                  operatingstat_r_view
CreateTime:             Mon Dec 05 10:46:07 CST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4
Partition Parameters:
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        numFiles                59
        numRows                 0
        rawDataSize             0
        totalSize               49869
        transient_lastDdlTime   1480906019

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ^
        serialization.format    ^
Time taken: 0.087 seconds, Fetched: 48 row(s)

drop the partition, both locations dropped:

hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition (year=2016,month=12,day=4);
Dropped the partition year=2016/month=12/day=04
Dropped the partition year=2016/month=12/day=4


now :

Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR='2016',MONTH='12',DAY='04')
LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';

INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR='2016',MONTH='12',DAY='04')
SELECT
    '2016-12-04',
    EventID,
    PlatID,
    ProvinceID,
    CityID,
    RefID,
    '',
    '',
    '',
    COUNT(1),
    COUNT(DISTINCT DeviceID)
FROM TCRecSys_ApplyData.OperatingStat_D_EventList
WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;


What’s problem?



Best Regards

Re: hive2.1.0 one partition has two locations

Posted by 徐 鹏 <xu...@outlook.com>.
Hi Elliot:
  I raised a new issue :https://issues.apache.org/jira/browse/HIVE-15512

在 2016年12月22日,下午5:05,Elliot West <te...@gmail.com>> 写道:

 rais


Re: hive2.1.0 one partition has two locations

Posted by Elliot West <te...@gmail.com>.
I believe there is an issue with non-string type partition values. On some
code path point they are incorrectly compared as strings when a numeric
comparison should be used instead. Consequently, as '04' ≠ '4' you get two
different partitions. To work around this you should ensure that only one
numerical partition key format is used: always strip leading zeros. I've
had a look in the Hive JIRA and can find no related issues that haven't
been fixed for Hive 2.1.0, so perhaps this is a new find. I'd suggest
raising a new issue: https://issues.apache.org/jira/browse/HIVE

Thanks,

Elliot.

On Thu, 22 Dec 2016 at 07:49, 徐 鹏 <xu...@outlook.com> wrote:

>
>
>
>
>
>
>
>
>
>
> Hi all:
>
>
>
>
>
>
>
>
>
>
>
>
> HQL:
>
>
> Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR=2016,MONTH=12,DAY=04)
>
> LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
>
>
>
> INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR=2016,MONTH=12,DAY=04)
>
> SELECT
>
>     '2016-12-04',
>
>     EventID,
>
>     PlatID,
>
>     ProvinceID,
>
>     CityID,
>
>     RefID,
>
>     '',
>
>     '',
>
>     '',
>
>     COUNT(1),
>
>     COUNT(DISTINCT DeviceID)
>
> FROM TCRecSys_ApplyData.OperatingStat_D_EventList
>
> WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
>
> GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
>
>
>
> expected result:
>
>
>
>
>
>
>
>    - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/
>    TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’
>
>
>
> actual result:
>
>
>
>
>
>
>
>    - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/
>    TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'
>    - partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/
>    TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4'
>
>
>
> desc format info:
>
>
>
>
>
>
> hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=04);
>
> OK
>
> # col_name              data_type               comment
>
>
>
> createdate              string
>
> eventid                 string
>
> platid                  string
>
> provinceid              string
>
> cityid                  string
>
> refid                   string
>
> def1                    string
>
> def2                    string
>
> def3                    string
>
> pv                      int
>
> uv                      int
>
>
>
> # Partition Information
>
> # col_name              data_type               comment
>
>
>
> year                    int
>
> month                   int
>
> day                     int
>
>
>
> # Detailed Partition Information
>
> Partition Value:        [2016, 12, 04]
>
> Database:               tcrecsys_applydata
>
> Table:                  operatingstat_r_view
>
> CreateTime:             Mon Dec 05 10:46:27 CST 2016
>
> LastAccessTime:         UNKNOWN
>
> Protect Mode:           None
>
> Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04
>
> Partition Parameters:
>
>         COLUMN_STATS_ACCURATE   false
>
>         numFiles                0
>
>         numRows                 -1
>
>         rawDataSize             -1
>
>         totalSize               0
>
>         transient_lastDdlTime   1480905987
>
>
>
> # Storage Information
>
> SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat:            org.apache.hadoop.mapred.TextInputFormat
>
> OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Compressed:             No
>
> Num Buckets:            -1
>
> Bucket Columns:         []
>
> Sort Columns:           []
>
> Storage Desc Params:
>
>         field.delim             ^
>
>         serialization.format    ^
>
> Time taken: 0.485 seconds, Fetched: 48 row(s)
>
> hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=4);
>
> OK
>
> # col_name              data_type               comment
>
>
>
> createdate              string
>
> eventid                 string
>
> platid                  string
>
> provinceid              string
>
> cityid                  string
>
> refid                   string
>
> def1                    string
>
> def2                    string
>
> def3                    string
>
> pv                      int
>
> uv                      int
>
>
>
> # Partition Information
>
> # col_name              data_type               comment
>
>
>
> year                    int
>
> month                   int
>
> day                     int
>
>
>
> # Detailed Partition Information
>
> Partition Value:        [2016, 12, 4]
>
> Database:               tcrecsys_applydata
>
> Table:                  operatingstat_r_view
>
> CreateTime:             Mon Dec 05 10:46:07 CST 2016
>
> LastAccessTime:         UNKNOWN
>
> Protect Mode:           None
>
> Location:               hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4
>
> Partition Parameters:
>
>         COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>
>         numFiles                59
>
>         numRows                 0
>
>         rawDataSize             0
>
>         totalSize               49869
>
>         transient_lastDdlTime   1480906019
>
>
>
> # Storage Information
>
> SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat:            org.apache.hadoop.mapred.TextInputFormat
>
> OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Compressed:             No
>
> Num Buckets:            -1
>
> Bucket Columns:         []
>
> Sort Columns:           []
>
> Storage Desc Params:
>
>         field.delim             ^
>
>         serialization.format    ^
>
> Time taken: 0.087 seconds, Fetched: 48 row(s)
>
>
>
> drop the partition, both locations dropped:
>
>
>
>
>
>
> hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition (year=2016,month=12,day=4);
>
> Dropped the partition year=2016/month=12/day=04
>
> Dropped the partition year=2016/month=12/day=4
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> now :
>
>
>
>
> Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR='2016',MONTH='12',DAY='04')
>
> LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
>
>
>
> INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR='2016',MONTH='12',DAY='04')
>
> SELECT
>
>     '2016-12-04',
>
>     EventID,
>
>     PlatID,
>
>     ProvinceID,
>
>     CityID,
>
>     RefID,
>
>     '',
>
>     '',
>
>     '',
>
>     COUNT(1),
>
>     COUNT(DISTINCT DeviceID)
>
> FROM TCRecSys_ApplyData.OperatingStat_D_EventList
>
> WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
>
> GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> What’s problem?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Best Regards
>
>
>
>
>
>
>
>
>