You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aniket Adnaik (Jira)" <ji...@apache.org> on 2021/06/15 16:29:00 UTC

[jira] [Assigned] (HIVE-25244) Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset

     [ https://issues.apache.org/jira/browse/HIVE-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aniket Adnaik reassigned HIVE-25244:
------------------------------------

    Assignee: Aniket Adnaik

> Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25244
>                 URL: https://issues.apache.org/jira/browse/HIVE-25244
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Parquet
>    Affects Versions: 3.1.0, 3.1.1, 3.1.2
>            Reporter: Aniket Adnaik
>            Assignee: Aniket Adnaik
>            Priority: Major
>             Fix For: 3.1.0, 3.1.1, 3.1.2, 3.2.0
>
>         Attachments: test_table3_data.tar.gz
>
>
> Hive predicate push down with Parquet format for partitioned column with column name as  keyword -> `date` produces empty result set.
> If any of the followings configs is set to false, then the select query returns results.
> hive.optimize.ppd.storage, hive.optimize.ppd , hive.optimize.index.filter .
> Repro steps:
> --------------
> 1. 
> 1) Create an external partitioned table in Hive
> CREATE EXTERNAL TABLE `test_table3`(`id` string) PARTITIONED BY (`date` string) STORED AS parquet;
> 2) In spark-shell create data frame and write the data parquet file
> import java.sql.Timestamp
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> import spark.implicits._
> val someDF = Seq(("1", "05172021"),("2", "05172021"), ("3", "06182021"), ("4", "07192021")).toDF("id", "date")
> someDF.write.mode("overwrite").parquet("<prefix path>/hive/warehouse/external/test_table3/date=05172021")
> 3) In Hive change the permissions and add partition to the table
> $> hdfs dfs -chmod -R 777 <prefix path>/hive/warehouse/external/test_table3
> Hive Beeline ->
> ALTER TABLE test_table3 ADD PARTITION(`date`='05172021') LOCATION  '<prefix path>/hive/warehouse/external/test_table3/date=05172021'
> 4) SELECT * FROM test_table3;   <----- produces all rows
> SELECT * FROM test_table3 WHERE `date`='05172021';   <--- produces no rows   
> SET hive.optimize.ppd.storage=false;  <--- turn off ppd push down optimization
> SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces rows after setting above config to false
> Attaching parquet data files for reference:
>  
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)