You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/08 22:12:22 UTC

[GitHub] [arrow-adbc] judahrand opened a new issue, #168: Implement BigQuery Driver

judahrand opened a new issue, #168:
URL: https://github.com/apache/arrow-adbc/issues/168

   This could be another interesting one as the API can return Arrow formatted data. Perhaps implemented in Go as I believe that's the 1st class SDK?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] judahrand commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

judahrand commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1372419657

   > Absolutely, but the Arrow interface only applies to (effectively) full table scans with some filters ("BigQuery Storage" != "BigQuery"), so we will need to parse one of the alternative outputs for general queries. Thanks for the reference though!
   
   The Python BigQuery SDK uses a trick to push any query into a table and then use BigQuery Storage API to fetch the result. We could use that here too in order to simplify things. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

lidavidm commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1372468016

   Oops, clearly I don't understand BigQuery well enough. Thanks (again) for digging into this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

lidavidm commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1315904113

   (Sorry for the delay.) Go might be interesting just to prove it out quickly. It may also be interesting to see Go implement the C interface and build an embeddable shared/static library to reduce the maintenance costs. (Right now Go can bind to the C interface, but not yet the other way.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] judahrand commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

judahrand commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1310472906

   Yeah, the Go BQ/BQS SDK has docs on how to do this: https://github.com/alvarowolfx/golang-samples/blob/1da1b3088e0db924ccfad402108131ca0daabd0b/bigquery/bigquery_storage_quickstart/main.go#L329-L369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

lidavidm commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1372431935

   Ah, interesting. That would be great, then. Thanks for pointing that out.
   
   I assume that would have cost/pricing implications though, and requires you to materialize the result before reading it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] paleolimbot commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

paleolimbot commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1326589531

   Obviously the Arrow interface is preferable, but I thought I'd post the C++ that the bigrquery R package uses to parse the JSON since the output data structure is pretty similar and I happen to know where it lives: https://github.com/r-dbi/bigrquery/blob/main/src/BqField.cpp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

lidavidm commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1327873012

   Absolutely, but the Arrow interface only applies to (effectively) full table scans with some filters ("BigQuery Storage" != "BigQuery"), so we will need to parse one of the alternative outputs for general queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] judahrand commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

judahrand commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1374180127

   Another useful source of inspiration - the Go SDK is in the process of implementing the same fast path as the Python SDK: https://github.com/googleapis/google-cloud-go/pull/6822
   
   To make the Go implementation straight forward interfaces on top of the changes made here would be made to add a method which is analogous to Python's `RowIterator.to_arrow_iterable`. That doesn't seem like a stretch, however.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

lidavidm commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1310455120

   BigQuery Storage sends Arrow over gRPC so it could be done natively for all of C++, Go, and Java. It would be interesting. Though, BQS can't evaluate SQL. So we might want to add the 'inverse' of the ADBC ingest API, for scanning a table without issuing an explicit query (or, specifying that drivers can translate a Substrait read request to such a scan).
   
   I'm less familiar with the 'standard' BigQuery API. The REST API gives [row-oriented JSON](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults) which isn't as great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] Implement BigQuery Driver [arrow-adbc]

Posted by "josevalim (via GitHub)" <gi...@apache.org>.

josevalim commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-2028661944

   I believe the Go client now exposes the Arrow iterator and data: https://github.com/googleapis/google-cloud-go/pull/8506 (which is likely using [the RPC API](https://cloud.google.com/bigquery/docs/reference/storage) to read the data).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] zeroshade commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

zeroshade commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1374279903

   A nice aspect is that the Go bigquery quickstart example ( https://github.com/GoogleCloudPlatform/golang-samples/blob/main/bigquery/bigquery_storage_quickstart/main.go) actually uses the latest released version of arrow (as opposed to snowflake which vendored a 2 year old version of Go Arrow right into their module)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] judahrand commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

judahrand commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1372442043

   This works because of this: 
   ![image](https://user-images.githubusercontent.com/17158624/210830704-940e0c73-afa9-4c9c-be87-c5ef6a557174.png)
   https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJob


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] judahrand commented on issue #168: Implement BigQuery Driver

Posted by GitBox <gi...@apache.org>.

judahrand commented on issue #168:
URL: https://github.com/apache/arrow-adbc/issues/168#issuecomment-1372465888

   Other docs on the fact that BigQuery actually writes ALL queries to a table: https://cloud.google.com/bigquery/docs/cached-results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org