You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Goden Yao (JIRA)" <ji...@apache.org> on 2016/07/14 17:40:21 UTC
[jira] [Updated] (HIVE-10888) Hive Dynamic Partition + Default
Partition makes Null Values Not querable
[ https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Goden Yao updated HIVE-10888:
-----------------------------
Description:
This is reported by Noa Horn
And HAWQ latest version should have this fixed in our queries.
=== Expected Behavior ===
When dynamic partition enabled and mode = nonstrict, the null value in the default partition should still be returned when user specify that in "...WHERE.... is Null".
=== Problem statment ===
*Enable dynamic partitions*
{code}
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict
#Get default partition name:
hive.exec.default.partition.name
Default Value: _HIVE_DEFAULT_PARTITION_
{code}
Hive creates a default partition if the partition key value doesn’t conform to the field type. For example, if the partition key is NULL.
*Hive Example*
Add the following parameters to hive-site.xml
{code}
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>true</value>
</property>
{code}
Create data:
vi /tmp/base_data.txt
1,1.0,1900-01-01
2,2.2,1994-04-14
3,3.3,2011-03-31
4,4.5,bla
5,5.0,2013-12-06
Create hive table and load the data to it. This table is used to load data to the partition table.
{code}
hive>
CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
SELECT * FROM base;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
4 4.5 NULL
5 5.0 2013-12-06
{code}
Note that one of the rows has NULL in its date field.
Create hive partition table and load data from base table to it. The data will be dynamically partitioned
{code}
CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
SELECT * FROM sales;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
5 5.0 2013-12-06
4 4.5 NULL
{code}
Check that the table has different partitions
{code}
hdfs dfs -ls /hive/warehouse/sales
Found 5 items
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
{code}
Hive queries with default partition
Queries without a filter or with a filter on a different field returns the default partition data:
{code}
hive> select * from sales;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
5 5.0 2013-12-06
4 4.5 NULL
Time taken: 0.578 seconds, Fetched: 5 row(s)
{code}
Queries with a filter on the partition field omit the default partition data:
{code}
hive> select * from sales where date <> '2013-12-06';
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
Time taken: 0.19 seconds, Fetched: 3 row(s)
hive> select * from sales where date is null;
OK
Time taken: 0.035 seconds
hive> select * from sales where date is not null;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
5 5.0 2013-12-06
Time taken: 0.042 seconds, Fetched: 4 row(s)
hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
OK
Time taken: 0.056 seconds
{code}
was:
This is reported by @hor
And HAWQ latest version should have this fixed in our queries.
=== Expected Behavior ===
When dynamic partition enabled and mode = nonstrict, the null value in the default partition should still be returned when user specify that in "...WHERE.... is Null".
=== Problem statment ===
*Enable dynamic partitions*
{code}
hive.exec.dynamic.partition = true
hive.exec.dynamic.partition.mode = nonstrict
#Get default partition name:
hive.exec.default.partition.name
Default Value: _HIVE_DEFAULT_PARTITION_
{code}
Hive creates a default partition if the partition key value doesn’t conform to the field type. For example, if the partition key is NULL.
*Hive Example*
Add the following parameters to hive-site.xml
{code}
<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>true</value>
</property>
{code}
Create data:
vi /tmp/base_data.txt
1,1.0,1900-01-01
2,2.2,1994-04-14
3,3.3,2011-03-31
4,4.5,bla
5,5.0,2013-12-06
Create hive table and load the data to it. This table is used to load data to the partition table.
{code}
hive>
CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
SELECT * FROM base;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
4 4.5 NULL
5 5.0 2013-12-06
{code}
Note that one of the rows has NULL in its date field.
Create hive partition table and load data from base table to it. The data will be dynamically partitioned
{code}
CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
SELECT * FROM sales;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
5 5.0 2013-12-06
4 4.5 NULL
{code}
Check that the table has different partitions
{code}
hdfs dfs -ls /hive/warehouse/sales
Found 5 items
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
{code}
Hive queries with default partition
Queries without a filter or with a filter on a different field returns the default partition data:
{code}
hive> select * from sales;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
5 5.0 2013-12-06
4 4.5 NULL
Time taken: 0.578 seconds, Fetched: 5 row(s)
{code}
Queries with a filter on the partition field omit the default partition data:
{code}
hive> select * from sales where date <> '2013-12-06';
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
Time taken: 0.19 seconds, Fetched: 3 row(s)
hive> select * from sales where date is null;
OK
Time taken: 0.035 seconds
hive> select * from sales where date is not null;
OK
1 1.0 1900-01-01
2 2.2 1994-04-14
3 3.3 2011-03-31
5 5.0 2013-12-06
Time taken: 0.042 seconds, Fetched: 4 row(s)
hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
OK
Time taken: 0.056 seconds
{code}
> Hive Dynamic Partition + Default Partition makes Null Values Not querable
> -------------------------------------------------------------------------
>
> Key: HIVE-10888
> URL: https://issues.apache.org/jira/browse/HIVE-10888
> Project: Hive
> Issue Type: Bug
> Components: Hive, Query Processor
> Reporter: Goden Yao
>
> This is reported by Noa Horn
> And HAWQ latest version should have this fixed in our queries.
> === Expected Behavior ===
> When dynamic partition enabled and mode = nonstrict, the null value in the default partition should still be returned when user specify that in "...WHERE.... is Null".
> === Problem statment ===
> *Enable dynamic partitions*
> {code}
> hive.exec.dynamic.partition = true
> hive.exec.dynamic.partition.mode = nonstrict
> #Get default partition name:
> hive.exec.default.partition.name
> Default Value: _HIVE_DEFAULT_PARTITION_
> {code}
> Hive creates a default partition if the partition key value doesn’t conform to the field type. For example, if the partition key is NULL.
> *Hive Example*
> Add the following parameters to hive-site.xml
> {code}
> <property>
> <name>hive.exec.dynamic.partition</name>
> <value>true</value>
> </property>
> <property>
> <name>hive.exec.dynamic.partition.mode</name>
> <value>true</value>
> </property>
> {code}
> Create data:
> vi /tmp/base_data.txt
> 1,1.0,1900-01-01
> 2,2.2,1994-04-14
> 3,3.3,2011-03-31
> 4,4.5,bla
> 5,5.0,2013-12-06
> Create hive table and load the data to it. This table is used to load data to the partition table.
> {code}
> hive>
> CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
> SELECT * FROM base;
> OK
> 1 1.0 1900-01-01
> 2 2.2 1994-04-14
> 3 3.3 2011-03-31
> 4 4.5 NULL
> 5 5.0 2013-12-06
> {code}
> Note that one of the rows has NULL in its date field.
> Create hive partition table and load data from base table to it. The data will be dynamically partitioned
> {code}
> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
> INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
> SELECT * FROM sales;
> OK
> 1 1.0 1900-01-01
> 2 2.2 1994-04-14
> 3 3.3 2011-03-31
> 5 5.0 2013-12-06
> 4 4.5 NULL
> {code}
> Check that the table has different partitions
> {code}
> hdfs dfs -ls /hive/warehouse/sales
> Found 5 items
> drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1900-01-01
> drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=1994-04-14
> drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2011-03-31
> drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=2013-12-06
> drwxr-xr-x - nhorn supergroup 0 2015-04-30 15:03 /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
> {code}
> Hive queries with default partition
> Queries without a filter or with a filter on a different field returns the default partition data:
> {code}
> hive> select * from sales;
> OK
> 1 1.0 1900-01-01
> 2 2.2 1994-04-14
> 3 3.3 2011-03-31
> 5 5.0 2013-12-06
> 4 4.5 NULL
> Time taken: 0.578 seconds, Fetched: 5 row(s)
> {code}
> Queries with a filter on the partition field omit the default partition data:
> {code}
> hive> select * from sales where date <> '2013-12-06';
> OK
> 1 1.0 1900-01-01
> 2 2.2 1994-04-14
> 3 3.3 2011-03-31
> Time taken: 0.19 seconds, Fetched: 3 row(s)
> hive> select * from sales where date is null;
> OK
> Time taken: 0.035 seconds
> hive> select * from sales where date is not null;
> OK
> 1 1.0 1900-01-01
> 2 2.2 1994-04-14
> 3 3.3 2011-03-31
> 5 5.0 2013-12-06
> Time taken: 0.042 seconds, Fetched: 4 row(s)
> hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
> OK
> Time taken: 0.056 seconds
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)