You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/11/26 04:15:00 UTC

[jira] [Commented] (SPARK-25774) Eliminate query anomalies with empty partitions - TRUNCATE, SELECT DISTINCT, etc.

    [ https://issues.apache.org/jira/browse/SPARK-25774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698463#comment-16698463 ] 

Apache Spark commented on SPARK-25774:
--------------------------------------

User 'lcqzte10192193' has created a pull request for this issue:
https://github.com/apache/spark/pull/23140

> Eliminate query anomalies with empty partitions - TRUNCATE, SELECT DISTINCT, etc.
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-25774
>                 URL: https://issues.apache.org/jira/browse/SPARK-25774
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>         Environment: Right now, I'm using Cloudera with Spark 2.2.0, but I understand it's a widespread thing.
>            Reporter: Steven Cardella
>            Priority: Major
>
> If you run a spark SQL TRUNCATE TABLE command on a managed table in Hive, it deletes the files in HDFS but leaves the partitions and partition folder structure.  If you then SELECT DISTINCT on the partition columns, it returns all the empty partition values.  So, you can have a SELECT DISTINCT return rows but SELECT * on the same table returns 0 rows.  
> Coming from SQL Server and the like, SELECT DISTINCT always reflects the ROWS, and Impala works like that as well.  
> I'd like SELECT DISTINCT to reflect rows, not partitions, TRUNCATE TABLE to have the option to drop partitions, and MSCK REPAIR TABLE to have the option to drop empty partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org