You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "dimtiris kanoute (Jira)" <ji...@apache.org> on 2022/03/01 10:11:00 UTC
[jira] [Updated] (SPARK-38319) Implement Strict Mode to prevent QUERY the entire table
[ https://issues.apache.org/jira/browse/SPARK-38319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dimtiris kanoute updated SPARK-38319:
-------------------------------------
Description:
We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.
We would like to restrict users from querying the entire table and force them to use {{WHERE }}clause in the query based on partition column\{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}{*}and{*} \{{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.
This behaviour is similar to what hive exposes as configuration
{{??hive.strict.checks.no.partition.filter??}}
{{??hive.strict.checks.orderby.no.limit??}}
and is described here: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]]
and
[[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]
This is a pretty common usecase / feature that we meet in other tools as well, like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries] .
It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session.
was:
We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.
We would like to restrict users from querying the entire table and force them to use {{WHERE }}clause in the query based on partition column{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}*and* {{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.
This behaviour is similar to what hive exposes as configuration
{{??hive.strict.checks.no.partition.filter??}}
{{??hive.strict.checks.orderby.no.limit??}}
and is described here: [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]
and
[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]
This is a pretty common usecase / feature that we meet in other tools as well, like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries] .
It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session.
Environment: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]
> Implement Strict Mode to prevent QUERY the entire table
> ---------------------------------------------------------
>
> Key: SPARK-38319
> URL: https://issues.apache.org/jira/browse/SPARK-38319
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 3.2.1
> Environment: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]
> Reporter: dimtiris kanoute
> Priority: Minor
> Labels: feature-request, improvement
>
> We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.
> We would like to restrict users from querying the entire table and force them to use {{WHERE }}clause in the query based on partition column\{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}{*}and{*} \{{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.
> This behaviour is similar to what hive exposes as configuration
> {{??hive.strict.checks.no.partition.filter??}}
> {{??hive.strict.checks.orderby.no.limit??}}
> and is described here: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]]
> and
> [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]
>
> This is a pretty common usecase / feature that we meet in other tools as well, like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries] .
> It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org