You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "dimtiris kanoute (Jira)" <ji...@apache.org> on 2022/02/24 17:38:00 UTC
[jira] [Created] (SPARK-38319) Implement Strict Mode to prevent QUERY the entire table
dimtiris kanoute created SPARK-38319:
----------------------------------------
Summary: Implement Strict Mode to prevent QUERY the entire table
Key: SPARK-38319
URL: https://issues.apache.org/jira/browse/SPARK-38319
Project: Spark
Issue Type: New Feature
Components: Spark Core
Affects Versions: 3.2.1
Reporter: dimtiris kanoute
We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.
We would like to restrict users from querying the entire table and force them to use {{WHERE }}clause in the query based on partition column{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}*and* {{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.
This behaviour is similar to what hive exposes as configuration
{{??hive.strict.checks.no.partition.filter??}}
{{??hive.strict.checks.orderby.no.limit??}}
and is described here: [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]
and
[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]
This is a pretty common usecase / feature that we meet in other tools as well, like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries] .
It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org