You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/08/12 20:01:16 UTC

[GitHub] [airflow] mik-laj commented on a change in pull request #5566: [AIRFLOW-4935] Add method in the bigquery hook to list tables in a dataset

mik-laj commented on a change in pull request #5566: [AIRFLOW-4935] Add method in the bigquery hook to list tables in a dataset
URL: https://github.com/apache/airflow/pull/5566#discussion_r313102815
 
 

 ##########
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##########
 @@ -1718,6 +1718,79 @@ def get_datasets_list(self, project_id=None):
 
         return datasets_list
 
+    def get_dataset_tables_list(self,dataset_id, project_id=None, table_prefix=None, max_results=None):
+        """
+        Method returns tables list of a BigQuery dataset. If table prefix is specified, only tables beginning by it are
+        returned.
+
+        .. seealso::
+            For more information, see:
+            https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list
+
+        :param dataset_id: The BigQuery Dataset ID
+        :type dataset_id: str
+        :param project_id: The GCP Project ID
+        :type project_id: str
+        :param table_prefix: Tables must begin by this prefix to be returned
+        :type table_prefix: str
+        :param max_results:
+        :type max_results: int
+        :return: dataset_tables_list
+        :type: list(tableReference)
+
+            Example of returned datasets_list:
+
+                    [
+                       {
+                          "projectId": "project1",
+                          "datasetId": "dataset1",
+                          "tableId": "table1"
+                        },
+                        {
+                          "projectId": "project2",
+                          "datasetId": "dataset2",
+                          "tableId": "table2"
+                        }
+                    ]
+        """
+        dataset_project_id = project_id if project_id else self.project_id
+
+        optional_params = {}
+        if max_results:
+            optional_params['maxResults'] = max_results
+
+        dataset_tables_list = []
+        next_page_token = None
+        while True:
+            if next_page_token:
+                optional_params['PageToken'] = next_page_token
+
+            try:
+                tables_list_resp = self.service.tables().list(
+                    projectId=dataset_project_id,
+                    datasetId=dataset_id,
+                    **optional_params).execute()
 
 Review comment:
   Good practice is to repeat requests in error response.
   Please run following command for example
   `grep -r execute airflow/`
   More information is also available in the GCP's official integration guide

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services