You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Aniket Adnaik (Jira)" <ji...@apache.org> on 2021/06/15 05:33:00 UTC

[jira] [Created] (HIVE-25244) Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset

Aniket Adnaik created HIVE-25244:
------------------------------------

             Summary: Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset
                 Key: HIVE-25244
                 URL: https://issues.apache.org/jira/browse/HIVE-25244
             Project: Hive
          Issue Type: Bug
          Components: Hive, Parquet
    Affects Versions: 3.1.2, 3.1.1, 3.1.0
            Reporter: Aniket Adnaik
             Fix For: 3.2.0, 3.1.2, 3.1.1, 3.1.0
         Attachments: test_table3_data.tar.gz

Hive predicate push down with Parquet format for partitioned column with column name as  keyword -> `date` produces empty result set.

If any of the followings configs is set to false, then the select query returns results.

hive.optimize.ppd.storage, hive.optimize.ppd , hive.optimize.index.filter .

Repro steps:

--------------

1. 

1) Create an external partitioned table in Hive

CREATE EXTERNAL TABLE `test_table3`(`id` string) PARTITIONED BY (`date` string) STORED AS parquet;

2) In spark-shell create data frame and write the data parquet file

import java.sql.Timestamp

import org.apache.spark.sql.Row

import org.apache.spark.sql.types._

import spark.implicits._

val someDF = Seq(("1", "05172021"),("2", "05172021"), ("3", "06182021"), ("4", "07192021")).toDF("id", "date")

someDF.write.mode("overwrite").parquet("<prefix path>/hive/warehouse/external/test_table3/date=05172021")

3) In Hive change the permissions and add partition to the table

$> hdfs dfs -chmod -R 777 <prefix path>/hive/warehouse/external/test_table3

Hive Beeline ->

ALTER TABLE test_table3 ADD PARTITION(`date`='05172021') LOCATION  '<prefix path>/hive/warehouse/external/test_table3/date=05172021'

4) SELECT * FROM test_table3;   <----- produces all rows

SELECT * FROM test_table3 WHERE `date`='05172021';   <--- produces no rows   

SET hive.optimize.ppd.storage=false;  <--- turn off ppd push down optimization

SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces rows after setting above config to false

Attaching parquet data files for reference:

 

 

 

  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)