You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "dimtiris kanoute (Jira)" <ji...@apache.org> on 2022/03/01 10:11:00 UTC

[jira] [Updated] (SPARK-38319) Implement Strict Mode to prevent QUERY the entire table

     [ https://issues.apache.org/jira/browse/SPARK-38319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dimtiris kanoute updated SPARK-38319:
-------------------------------------
    Description: 
We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.

We would like to restrict users from querying the entire table and force them to use {{WHERE  }}clause in the query based on partition column\{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}{*}and{*}  \{{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.

This behaviour is similar to what hive exposes as configuration

{{??hive.strict.checks.no.partition.filter??}}

{{??hive.strict.checks.orderby.no.limit??}}

and is described here: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]]

and

[[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]

 

This is a pretty common usecase / feature that we meet in other tools as well,  like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries]  .

It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session. 

  was:
We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.

We would like to restrict users from querying the entire table and force them to use {{WHERE  }}clause in the query based on partition column{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}*and*  {{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.

This behaviour is similar to what hive exposes as configuration

{{??hive.strict.checks.no.partition.filter??}}

{{??hive.strict.checks.orderby.no.limit??}}

and is described here: [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]

and

[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]

 

This is a pretty common usecase / feature that we meet in other tools as well,  like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries]  .

It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session. 

    Environment: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]

> Implement Strict Mode to prevent QUERY the entire table  
> ---------------------------------------------------------
>
>                 Key: SPARK-38319
>                 URL: https://issues.apache.org/jira/browse/SPARK-38319
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.2.1
>         Environment: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]
>            Reporter: dimtiris kanoute
>            Priority: Minor
>              Labels: feature-request, improvement
>
> We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.
> We would like to restrict users from querying the entire table and force them to use {{WHERE  }}clause in the query based on partition column\{{ (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) }}{*}and{*}  \{{LIMIT }}the output of the query when {{ORDER}} {{BY}} is used.
> This behaviour is similar to what hive exposes as configuration
> {{??hive.strict.checks.no.partition.filter??}}
> {{??hive.strict.checks.orderby.no.limit??}}
> and is described here: [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812|http://example.com/]]
> and
> [[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L18126|http://example.com/]|[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816|http://example.com/]]
>  
> This is a pretty common usecase / feature that we meet in other tools as well,  like in BigQuery for example: [https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries]  .
> It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org