You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 徐 鹏 <xu...@outlook.com> on 2016/12/22 07:48:52 UTC
hive2.1.0 one partition has two locations
Hi all:
HQL:
Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR=2016,MONTH=12,DAY=04)
LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR=2016,MONTH=12,DAY=04)
SELECT
'2016-12-04',
EventID,
PlatID,
ProvinceID,
CityID,
RefID,
'',
'',
'',
COUNT(1),
COUNT(DISTINCT DeviceID)
FROM TCRecSys_ApplyData.OperatingStat_D_EventList
WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
expected result:
* partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’
actual result:
* partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'
* partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4'
desc format info:
hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=04);
OK
# col_name data_type comment
createdate string
eventid string
platid string
provinceid string
cityid string
refid string
def1 string
def2 string
def3 string
pv int
uv int
# Partition Information
# col_name data_type comment
year int
month int
day int
# Detailed Partition Information
Partition Value: [2016, 12, 04]
Database: tcrecsys_applydata
Table: operatingstat_r_view
CreateTime: Mon Dec 05 10:46:27 CST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Location: hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04
Partition Parameters:
COLUMN_STATS_ACCURATE false
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1480905987
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ^
serialization.format ^
Time taken: 0.485 seconds, Fetched: 48 row(s)
hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=4);
OK
# col_name data_type comment
createdate string
eventid string
platid string
provinceid string
cityid string
refid string
def1 string
def2 string
def3 string
pv int
uv int
# Partition Information
# col_name data_type comment
year int
month int
day int
# Detailed Partition Information
Partition Value: [2016, 12, 4]
Database: tcrecsys_applydata
Table: operatingstat_r_view
CreateTime: Mon Dec 05 10:46:07 CST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Location: hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
numFiles 59
numRows 0
rawDataSize 0
totalSize 49869
transient_lastDdlTime 1480906019
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ^
serialization.format ^
Time taken: 0.087 seconds, Fetched: 48 row(s)
drop the partition, both locations dropped:
hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition (year=2016,month=12,day=4);
Dropped the partition year=2016/month=12/day=04
Dropped the partition year=2016/month=12/day=4
now :
Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR='2016',MONTH='12',DAY='04')
LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR='2016',MONTH='12',DAY='04')
SELECT
'2016-12-04',
EventID,
PlatID,
ProvinceID,
CityID,
RefID,
'',
'',
'',
COUNT(1),
COUNT(DISTINCT DeviceID)
FROM TCRecSys_ApplyData.OperatingStat_D_EventList
WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
What’s problem?
Best Regards
Re: hive2.1.0 one partition has two locations
Posted by 徐 鹏 <xu...@outlook.com>.
Hi Elliot:
I raised a new issue :https://issues.apache.org/jira/browse/HIVE-15512
在 2016年12月22日,下午5:05,Elliot West <te...@gmail.com>> 写道:
rais
Re: hive2.1.0 one partition has two locations
Posted by Elliot West <te...@gmail.com>.
I believe there is an issue with non-string type partition values. On some
code path point they are incorrectly compared as strings when a numeric
comparison should be used instead. Consequently, as '04' ≠ '4' you get two
different partitions. To work around this you should ensure that only one
numerical partition key format is used: always strip leading zeros. I've
had a look in the Hive JIRA and can find no related issues that haven't
been fixed for Hive 2.1.0, so perhaps this is a new find. I'd suggest
raising a new issue: https://issues.apache.org/jira/browse/HIVE
Thanks,
Elliot.
On Thu, 22 Dec 2016 at 07:49, 徐 鹏 <xu...@outlook.com> wrote:
>
>
>
>
>
>
>
>
>
>
> Hi all:
>
>
>
>
>
>
>
>
>
>
>
>
> HQL:
>
>
> Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR=2016,MONTH=12,DAY=04)
>
> LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
>
>
>
> INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR=2016,MONTH=12,DAY=04)
>
> SELECT
>
> '2016-12-04',
>
> EventID,
>
> PlatID,
>
> ProvinceID,
>
> CityID,
>
> RefID,
>
> '',
>
> '',
>
> '',
>
> COUNT(1),
>
> COUNT(DISTINCT DeviceID)
>
> FROM TCRecSys_ApplyData.OperatingStat_D_EventList
>
> WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
>
> GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
>
>
>
> expected result:
>
>
>
>
>
>
>
> - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/
> TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04’
>
>
>
> actual result:
>
>
>
>
>
>
>
> - partition(YEAR=2016,MONTH=12,DAY=04) LOCATION '/data/ApplicationDep/
> TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04'
> - partition(YEAR=2016,MONTH=12,DAY=4) LOCATION '/data/ApplicationDep/
> TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4'
>
>
>
> desc format info:
>
>
>
>
>
>
> hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=04);
>
> OK
>
> # col_name data_type comment
>
>
>
> createdate string
>
> eventid string
>
> platid string
>
> provinceid string
>
> cityid string
>
> refid string
>
> def1 string
>
> def2 string
>
> def3 string
>
> pv int
>
> uv int
>
>
>
> # Partition Information
>
> # col_name data_type comment
>
>
>
> year int
>
> month int
>
> day int
>
>
>
> # Detailed Partition Information
>
> Partition Value: [2016, 12, 04]
>
> Database: tcrecsys_applydata
>
> Table: operatingstat_r_view
>
> CreateTime: Mon Dec 05 10:46:27 CST 2016
>
> LastAccessTime: UNKNOWN
>
> Protect Mode: None
>
> Location: hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04
>
> Partition Parameters:
>
> COLUMN_STATS_ACCURATE false
>
> numFiles 0
>
> numRows -1
>
> rawDataSize -1
>
> totalSize 0
>
> transient_lastDdlTime 1480905987
>
>
>
> # Storage Information
>
> SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat: org.apache.hadoop.mapred.TextInputFormat
>
> OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Compressed: No
>
> Num Buckets: -1
>
> Bucket Columns: []
>
> Sort Columns: []
>
> Storage Desc Params:
>
> field.delim ^
>
> serialization.format ^
>
> Time taken: 0.485 seconds, Fetched: 48 row(s)
>
> hive> desc formatted TCRecSys_ApplyData.OperatingStat_R_View partition (year=2016,month=12,day=4);
>
> OK
>
> # col_name data_type comment
>
>
>
> createdate string
>
> eventid string
>
> platid string
>
> provinceid string
>
> cityid string
>
> refid string
>
> def1 string
>
> def2 string
>
> def3 string
>
> pv int
>
> uv int
>
>
>
> # Partition Information
>
> # col_name data_type comment
>
>
>
> year int
>
> month int
>
> day int
>
>
>
> # Detailed Partition Information
>
> Partition Value: [2016, 12, 4]
>
> Database: tcrecsys_applydata
>
> Table: operatingstat_r_view
>
> CreateTime: Mon Dec 05 10:46:07 CST 2016
>
> LastAccessTime: UNKNOWN
>
> Protect Mode: None
>
> Location: hdfs://hadoopcluster/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/year=2016/month=12/day=4
>
> Partition Parameters:
>
> COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
>
> numFiles 59
>
> numRows 0
>
> rawDataSize 0
>
> totalSize 49869
>
> transient_lastDdlTime 1480906019
>
>
>
> # Storage Information
>
> SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
> InputFormat: org.apache.hadoop.mapred.TextInputFormat
>
> OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>
> Compressed: No
>
> Num Buckets: -1
>
> Bucket Columns: []
>
> Sort Columns: []
>
> Storage Desc Params:
>
> field.delim ^
>
> serialization.format ^
>
> Time taken: 0.087 seconds, Fetched: 48 row(s)
>
>
>
> drop the partition, both locations dropped:
>
>
>
>
>
>
> hive> alter table TCRecSys_ApplyData.OperatingStat_R_View drop partition (year=2016,month=12,day=4);
>
> Dropped the partition year=2016/month=12/day=04
>
> Dropped the partition year=2016/month=12/day=4
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> now :
>
>
>
>
> Alter table OperatingStat_R_View Add IF NOT EXISTS partition(YEAR='2016',MONTH='12',DAY='04')
>
> LOCATION '/data/ApplicationDep/TCRecSys_ApplyData/OperatingStat/Report/View/2016/12/04';
>
>
>
> INSERT OVERWRITE TABLE TCRecSys_ApplyData.OperatingStat_R_View PARTITION (YEAR='2016',MONTH='12',DAY='04')
>
> SELECT
>
> '2016-12-04',
>
> EventID,
>
> PlatID,
>
> ProvinceID,
>
> CityID,
>
> RefID,
>
> '',
>
> '',
>
> '',
>
> COUNT(1),
>
> COUNT(DISTINCT DeviceID)
>
> FROM TCRecSys_ApplyData.OperatingStat_D_EventList
>
> WHERE YEAR=2016 AND MONTH=12 AND DAY=04 AND LabelID='01'
>
> GROUP BY EventID,PlatID,ProvinceID,CityID,RefID;
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> What’s problem?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Best Regards
>
>
>
>
>
>
>
>
>