You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2021/12/29 16:53:19 UTC

[GitHub] [superset] riordan commented on issue #17299: Select * Limit is DANGEROUS in BigQuery

riordan commented on issue #17299:
URL: https://github.com/apache/superset/issues/17299#issuecomment-1002686382


   Additional context: BigQuery charges users based (near-exclusively) on the count of bytes loaded into memory. As such, they _strongly_ [discourage the use of `SELECT *`](https://cloud.google.com/bigquery/docs/best-practices-costs#avoid_select_) to control costs, yet `SELECT *...` is a great convention to help users bootstrap their own queries.
   
   The ways around this are:
   1. Large datasets are encouraged to be partitioned (usually by time)
   2.  Selecting only from the needed columns
   
   In practice, using the latest partition is the most practical way to constrain costs on bootstrapped queries (`SELECT * FROM table WHERE date={$latestpartition}`). This is how the [BQ UI](https://console.cloud.google.com/bigquery) (and most BQ-aware SQL tools) limit costs.
   
   It's also possible to store each partition as a separate table in a dataset and query it [hive-style](https://cloud.google.com/bigquery/docs/hive-partitioned-queries-gcs), and it's the way external data is partition queried in BQ. However, this has the disadvantage (for Google) of making it harder for users to write lots of queries that bill the most expensive possible way.
   
   I once came across a single query that cost $37,000 USD:
   `SELECT * FROM eventlogs* LIMIT 1;`
   
   A single 37k query, going back over years and trillions of events... just to fetch the table schema 🤦‍♂️🤣.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org