You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/18 15:14:24 UTC

[GitHub] [incubator-pinot] cyrilou242 commented on pull request #5868: [TE] Added support for BigQuery as data source

cyrilou242 commented on pull request #5868:
URL: https://github.com/apache/incubator-pinot/pull/5868#issuecomment-675540235


   Sadly BigQuery does not have a mock server.
   It's possible to test on some public, free (up to 1Tb) databases.
   
   ### Requirements:
   Having a GCP project (named `test-project` below) with the BigQuery API activated. See "Before you begin" step here:  
   https://cloud.google.com/bigquery/docs/quickstarts/quickstart-command-line#before-you-begin
   
   ### Prepare the config file
   Create the relevant `data-sources-config.yml`:
   ```yaml
   dataSourceConfigs:
     - className: org.apache.pinot.thirdeye.datasource.sql.SqlThirdEyeDataSource
       properties:
         BigQuery:
           - db:
               test: jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=test-project;OAuthType=3;
             driver: com.simba.googlebigquery.jdbc42.Driver
   ```
   Here for this quickstart the authentication method used is the application default credentials. 
   If the cloud sdk was installed normally, it should authenticate as the user of the project without problems.
   See https://www.simba.com/products/BigQuery/doc/JDBC_InstallGuide/content/jdbc/bq/authenticating/appdefaultcred.htm
   
   ### Launch thirdeye
   `./run-frontend.sh`
   
   ### Create a metric using a public dataset:  
   Querying the table on the following columns will amount for 52Kb of data at each query.  
   The first 1Tb per month is free so it should be ok.  
   See https://cloud.google.com/bigquery/public-data
   
   Go to http://localhost:1426/app/#/self-serve/import-sql-metric  and create the metric
   ```
   Table name: `bigquery-public-data.google_analytics_sample.ga_sessions_20170701`  (keep the backticks) 
   TimeColumn: visitStartTime
   Timezone: UTC
   Time Format: EPOCH
   Time granularity: 1  SECONDS
   
   Dimension 0: device.browser
   
   Metrics 0: totals.pageviews
   Aggregation Method: SUM
   ```
   
   ###  Test
   Go to root cause analysis and select your metric http://localhost:1426/app/#/rootcause
   
   Let me know if integration tests are mandatory and/or if I should add the public dataset example in the doc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org