You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2019/07/27 03:43:48 UTC

BigQuery Storage API now supports Arow

Hi Arrow Dev,
As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
just wanted to share some work that Brian Hulette and I helped out with.

I'm happy to announce there is now preliminary support for reading Arrow
data in the BigQuery Storage API [1].  Python library support is available
in the latest release of google-cloud-bigquery-storage [2][3].

Caveats:
- Small cached tables are not supported (same with Avro)
- Row filters aren't supported yet.

Cheers,
Micah

[1]
https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
[2] https://cloud.google.com/bigquery/docs/reference/storage/
[3] https://pypi.org/project/google-cloud-bigquery-storage/
[4]
https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow

Re: BigQuery Storage API now supports Arow

Posted by Micah Kornfield <em...@gmail.com>.
>
> That’s awesome!! It’s pretty surreal to request a feature from google and
> have it built out.

I hope this is beneficial customers in general.  Thank you for filing the
request.

If I'm reading the code correctly  looks like you are transporting the
> IPC payload in the protobuf format of the bigquery storage API
>
> https://github.com/googleapis/google-cloud-python/blob/3d324389b92d43e52486f0fe2aca8b41e950640c/bigquery_storage/google/cloud/bigquery_storage_v1beta1/proto/arrow.proto


Yes, the data is returned in a oneof [1] on a ReadRowsResponse. One needs
to specify the data format when creating the read session [2].

[1]
https://github.com/googleapis/google-cloud-python/blob/3d324389b92d43e52486f0fe2aca8b41e950640c/bigquery_storage/google/cloud/bigquery_storage_v1beta1/proto/storage.proto#L294
[2]
https://github.com/googleapis/google-cloud-python/blob/3d324389b92d43e52486f0fe2aca8b41e950640c/bigquery_storage/google/cloud/bigquery_storage_v1beta1/proto/storage.proto#L182

- Micah

On Sat, Jul 27, 2019 at 1:33 PM Wes McKinney <we...@gmail.com> wrote:

> Very nice!
>
> If I'm reading the code correctly  looks like you are transporting the
> IPC payload in the protobuf format of the bigquery storage API
>
>
> https://github.com/googleapis/google-cloud-python/blob/3d324389b92d43e52486f0fe2aca8b41e950640c/bigquery_storage/google/cloud/bigquery_storage_v1beta1/proto/arrow.proto
>
> Seems very reasonable. Glad you were able to pull this off!
>
> On Sat, Jul 27, 2019 at 2:55 PM Jonathan Chiang <ch...@gmail.com>
> wrote:
> >
> > Hi Micah,
> >
> > That’s awesome!! It’s pretty surreal to request a feature from google
> and have it built out.
> >
> > Thanks,
> > Jonathan
> >
> > > On Jul 26, 2019, at 8:43 PM, Micah Kornfield <em...@gmail.com>
> wrote:
> > >
> > > Hi Arrow Dev,
> > > As a follow-up to an old thread [1] on working with BigQuery and
> Arrow. I
> > > just wanted to share some work that Brian Hulette and I helped out
> with.
> > >
> > > I'm happy to announce there is now preliminary support for reading
> Arrow
> > > data in the BigQuery Storage API [1].  Python library support is
> available
> > > in the latest release of google-cloud-bigquery-storage [2][3].
> > >
> > > Caveats:
> > > - Small cached tables are not supported (same with Avro)
> > > - Row filters aren't supported yet.
> > >
> > > Cheers,
> > > Micah
> > >
> > > [1]
> > >
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> > > [2] https://cloud.google.com/bigquery/docs/reference/storage/
> > > [3] https://pypi.org/project/google-cloud-bigquery-storage/
> > > [4]
> > >
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
>

Re: BigQuery Storage API now supports Arow

Posted by Wes McKinney <we...@gmail.com>.
Very nice!

If I'm reading the code correctly  looks like you are transporting the
IPC payload in the protobuf format of the bigquery storage API

https://github.com/googleapis/google-cloud-python/blob/3d324389b92d43e52486f0fe2aca8b41e950640c/bigquery_storage/google/cloud/bigquery_storage_v1beta1/proto/arrow.proto

Seems very reasonable. Glad you were able to pull this off!

On Sat, Jul 27, 2019 at 2:55 PM Jonathan Chiang <ch...@gmail.com> wrote:
>
> Hi Micah,
>
> That’s awesome!! It’s pretty surreal to request a feature from google and have it built out.
>
> Thanks,
> Jonathan
>
> > On Jul 26, 2019, at 8:43 PM, Micah Kornfield <em...@gmail.com> wrote:
> >
> > Hi Arrow Dev,
> > As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> > just wanted to share some work that Brian Hulette and I helped out with.
> >
> > I'm happy to announce there is now preliminary support for reading Arrow
> > data in the BigQuery Storage API [1].  Python library support is available
> > in the latest release of google-cloud-bigquery-storage [2][3].
> >
> > Caveats:
> > - Small cached tables are not supported (same with Avro)
> > - Row filters aren't supported yet.
> >
> > Cheers,
> > Micah
> >
> > [1]
> > https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> > [2] https://cloud.google.com/bigquery/docs/reference/storage/
> > [3] https://pypi.org/project/google-cloud-bigquery-storage/
> > [4]
> > https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow

Re: BigQuery Storage API now supports Arow

Posted by Jonathan Chiang <ch...@gmail.com>.
Hi Micah,

That’s awesome!! It’s pretty surreal to request a feature from google and have it built out. 

Thanks,
Jonathan 

> On Jul 26, 2019, at 8:43 PM, Micah Kornfield <em...@gmail.com> wrote:
> 
> Hi Arrow Dev,
> As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> just wanted to share some work that Brian Hulette and I helped out with.
> 
> I'm happy to announce there is now preliminary support for reading Arrow
> data in the BigQuery Storage API [1].  Python library support is available
> in the latest release of google-cloud-bigquery-storage [2][3].
> 
> Caveats:
> - Small cached tables are not supported (same with Avro)
> - Row filters aren't supported yet.
> 
> Cheers,
> Micah
> 
> [1]
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> [2] https://cloud.google.com/bigquery/docs/reference/storage/
> [3] https://pypi.org/project/google-cloud-bigquery-storage/
> [4]
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow

Re: BigQuery Storage API now supports Arow

Posted by Micah Kornfield <em...@gmail.com>.
>
> This is nice.  Reading the original ML thread [1], does this mean that
> high-speed Avro-to-Arrow parsing has become less important now?


I think this is still important from an Arrow perspective.   Avro is still
a very popular serialization format and probably the most popular one Arrow
doesn't yet have support for.

On Mon, Jul 29, 2019 at 6:03 AM Antoine Pitrou <an...@python.org> wrote:

>
> Hi Micah,
>
> Le 27/07/2019 à 05:43, Micah Kornfield a écrit :
> > Hi Arrow Dev,
> > As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> > just wanted to share some work that Brian Hulette and I helped out with.
> >
> > I'm happy to announce there is now preliminary support for reading Arrow
> > data in the BigQuery Storage API [1].  Python library support is
> available
> > in the latest release of google-cloud-bigquery-storage [2][3].
>
> This is nice.  Reading the original ML thread [1], does this mean that
> high-speed Avro-to-Arrow parsing has become less important now?
>
> Regards
>
> Antoine.
>
>
> >
> > Caveats:
> > - Small cached tables are not supported (same with Avro)
> > - Row filters aren't supported yet.
> >
> > Cheers,
> > Micah
> >
> > [1]
> >
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> > [2] https://cloud.google.com/bigquery/docs/reference/storage/
> > [3] https://pypi.org/project/google-cloud-bigquery-storage/
> > [4]
> >
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
> >
>

Re: BigQuery Storage API now supports Arow

Posted by Antoine Pitrou <an...@python.org>.
Hi Micah,

Le 27/07/2019 à 05:43, Micah Kornfield a écrit :
> Hi Arrow Dev,
> As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> just wanted to share some work that Brian Hulette and I helped out with.
> 
> I'm happy to announce there is now preliminary support for reading Arrow
> data in the BigQuery Storage API [1].  Python library support is available
> in the latest release of google-cloud-bigquery-storage [2][3].

This is nice.  Reading the original ML thread [1], does this mean that
high-speed Avro-to-Arrow parsing has become less important now?

Regards

Antoine.


> 
> Caveats:
> - Small cached tables are not supported (same with Avro)
> - Row filters aren't supported yet.
> 
> Cheers,
> Micah
> 
> [1]
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> [2] https://cloud.google.com/bigquery/docs/reference/storage/
> [3] https://pypi.org/project/google-cloud-bigquery-storage/
> [4]
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
> 

Re: BigQuery Storage API now supports Arow

Posted by Antoine Pitrou <an...@python.org>.
Le 29/07/2019 à 15:13, David Li a écrit :
> Ah, sorry, I was unclear - the performance issue is not with Flight at
> all, but with putting Arrow over gRPC naively.
> 
> At some point, we benchmarked gRPC-Python carrying Arrow data, and
> found that it only achieved ~half the throughput of Flight-Python. So
> implementing BigQuery-Flight would also avoid that performance
> pitfall, assuming the client library for BigQuery-Arrow uses
> gRPC-Python.
> 
> The reason we found is that since gRPC technically does not require
> Protobuf, it copies message payloads into a CPython bytestring, and
> then the Python code then turns around and hands that to Protobuf,
> which then copies data into its data structures and gives it back to
> Python

gRPC shouldn't need to copy the payload into a CPython bytestring.
Instead, it could instantiate a buffer-like Python object pointing to
the original data.  This is "easily" done in Cython, and gRPC-python
already uses Cython:
https://cython.readthedocs.io/en/latest/src/userguide/buffer.html
https://docs.python.org/3/c-api/buffer.html

Regards

Antoine.

Re: BigQuery Storage API now supports Arow

Posted by David Li <li...@gmail.com>.
Ah, sorry, I was unclear - the performance issue is not with Flight at
all, but with putting Arrow over gRPC naively.

At some point, we benchmarked gRPC-Python carrying Arrow data, and
found that it only achieved ~half the throughput of Flight-Python. So
implementing BigQuery-Flight would also avoid that performance
pitfall, assuming the client library for BigQuery-Arrow uses
gRPC-Python.

The reason we found is that since gRPC technically does not require
Protobuf, it copies message payloads into a CPython bytestring, and
then the Python code then turns around and hands that to Protobuf,
which then copies data into its data structures and gives it back to
Python. If we implemented a BigQuery Flight backend in C++ and wrote
Python bindings, we could avoid all that.

Best,
David

On 7/29/19, Antoine Pitrou <so...@pitrou.net> wrote:
>
> Hi David,
>
> On Mon, 29 Jul 2019 09:06:52 -0400
> David Li <li...@gmail.com> wrote:
>>
>> If the current gRPC stub definitions are reasonably stable (in your
>> opinion), I might try implementing support. That might get reasonable
>> performance still, especially in Python (where I've found that a lot
>> of performance is lost copying messages into/out of CPython to work
>> with Protobuf & gRPC
>
> Can you elaborate on this performance issue?  Is it with our Flight
> Python bindings?
>
> Regards
>
> Antoine.
>
>
>

Re: BigQuery Storage API now supports Arow

Posted by Antoine Pitrou <so...@pitrou.net>.
Hi David,

On Mon, 29 Jul 2019 09:06:52 -0400
David Li <li...@gmail.com> wrote:
> 
> If the current gRPC stub definitions are reasonably stable (in your
> opinion), I might try implementing support. That might get reasonable
> performance still, especially in Python (where I've found that a lot
> of performance is lost copying messages into/out of CPython to work
> with Protobuf & gRPC

Can you elaborate on this performance issue?  Is it with our Flight
Python bindings?

Regards

Antoine.



Re: BigQuery Storage API now supports Arow

Posted by Micah Kornfield <em...@gmail.com>.
>
> If the current gRPC stub definitions are reasonably stable (in your
> opinion), I might try implementing support.

I would guess that is relatively stable, but I don't think I can make any
guarantees (as far as I know there are no guarantees made between beta and
GA API versions).  So while I would love to see this done, it might pay to
wait for GA to get some level of stability guarantees.


On Mon, Jul 29, 2019 at 6:06 AM David Li <li...@gmail.com> wrote:

> Hey Micah,
>
> There hasn't really been formal discussions of Flight "backends", but
> there has been some talk about supporting protocols besides gRPC
> (which is why the implementation tries to abstract away from gRPC). So
> it might be interesting to treat this as another "protocol" in Flight
> clients that can only read from BigQuery.
>
> If the current gRPC stub definitions are reasonably stable (in your
> opinion), I might try implementing support. That might get reasonable
> performance still, especially in Python (where I've found that a lot
> of performance is lost copying messages into/out of CPython to work
> with Protobuf & gRPC - presumably grpc-c++ wouldn't do that and/or we
> could do the read optimizations ourselves).
>
> Best,
> David
>
> On 7/27/19, Micah Kornfield <em...@gmail.com> wrote:
> > Hi David,
> >
> >> I see the original thread mentioned Flight support, do you think it'd
> >> be possible to support Flight natively? Or conversely, maybe this
> >> could be a candidate for a new Flight "backend" as has been discussed.
> >
> > Right now our main priority is addressing the caveats I mentioned above.
> > After that I would like to see if we can get some of the client side
> > optimizations Flight uses either into gRPC directly or perhaps
> specialized
> > clients.  Given my current backlog, I don't think these will happen
> anytime
> > soon.
> >
> > In my opinion (this is really above my pay-grade) native flight support
> > probably hinges on two things:
> > 1.  Customer demand for it.
> > 2.  The Flight APIs would likely need to conform to the Cloud API design
> > guidelines [1].  I haven't looked closely enough to see if there are any
> > incompatibilities between the two.
> >
> > I don't recall seeing the discussion on "new flight backends", could you
> > provide a pointer?  This might be a shorter path for support.  I'd also
> > like to make an adapter for the C++ Datasets API, but again, given my
> > current backlog it will take a while for this to happen.
> >
> > Thanks,
> > Micah
> >
> > [1] https://cloud.google.com/apis/design/
> >
> > On Sat, Jul 27, 2019 at 6:17 AM David Li <li...@gmail.com> wrote:
> >
> >> This is super awesome, thanks for sharing!
> >>
> >> I see the original thread mentioned Flight support, do you think it'd
> >> be possible to support Flight natively? Or conversely, maybe this
> >> could be a candidate for a new Flight "backend" as has been discussed.
> >>
> >> Best,
> >> David
> >>
> >> On 7/26/19, Micah Kornfield <em...@gmail.com> wrote:
> >> > Hi Arrow Dev,
> >> > As a follow-up to an old thread [1] on working with BigQuery and
> Arrow.
> >> > I
> >> > just wanted to share some work that Brian Hulette and I helped out
> >> > with.
> >> >
> >> > I'm happy to announce there is now preliminary support for reading
> >> > Arrow
> >> > data in the BigQuery Storage API [1].  Python library support is
> >> available
> >> > in the latest release of google-cloud-bigquery-storage [2][3].
> >> >
> >> > Caveats:
> >> > - Small cached tables are not supported (same with Avro)
> >> > - Row filters aren't supported yet.
> >> >
> >> > Cheers,
> >> > Micah
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> >> > [2] https://cloud.google.com/bigquery/docs/reference/storage/
> >> > [3] https://pypi.org/project/google-cloud-bigquery-storage/
> >> > [4]
> >> >
> >>
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
> >> >
> >>
> >
>

Re: BigQuery Storage API now supports Arow

Posted by David Li <li...@gmail.com>.
Hey Micah,

There hasn't really been formal discussions of Flight "backends", but
there has been some talk about supporting protocols besides gRPC
(which is why the implementation tries to abstract away from gRPC). So
it might be interesting to treat this as another "protocol" in Flight
clients that can only read from BigQuery.

If the current gRPC stub definitions are reasonably stable (in your
opinion), I might try implementing support. That might get reasonable
performance still, especially in Python (where I've found that a lot
of performance is lost copying messages into/out of CPython to work
with Protobuf & gRPC - presumably grpc-c++ wouldn't do that and/or we
could do the read optimizations ourselves).

Best,
David

On 7/27/19, Micah Kornfield <em...@gmail.com> wrote:
> Hi David,
>
>> I see the original thread mentioned Flight support, do you think it'd
>> be possible to support Flight natively? Or conversely, maybe this
>> could be a candidate for a new Flight "backend" as has been discussed.
>
> Right now our main priority is addressing the caveats I mentioned above.
> After that I would like to see if we can get some of the client side
> optimizations Flight uses either into gRPC directly or perhaps specialized
> clients.  Given my current backlog, I don't think these will happen anytime
> soon.
>
> In my opinion (this is really above my pay-grade) native flight support
> probably hinges on two things:
> 1.  Customer demand for it.
> 2.  The Flight APIs would likely need to conform to the Cloud API design
> guidelines [1].  I haven't looked closely enough to see if there are any
> incompatibilities between the two.
>
> I don't recall seeing the discussion on "new flight backends", could you
> provide a pointer?  This might be a shorter path for support.  I'd also
> like to make an adapter for the C++ Datasets API, but again, given my
> current backlog it will take a while for this to happen.
>
> Thanks,
> Micah
>
> [1] https://cloud.google.com/apis/design/
>
> On Sat, Jul 27, 2019 at 6:17 AM David Li <li...@gmail.com> wrote:
>
>> This is super awesome, thanks for sharing!
>>
>> I see the original thread mentioned Flight support, do you think it'd
>> be possible to support Flight natively? Or conversely, maybe this
>> could be a candidate for a new Flight "backend" as has been discussed.
>>
>> Best,
>> David
>>
>> On 7/26/19, Micah Kornfield <em...@gmail.com> wrote:
>> > Hi Arrow Dev,
>> > As a follow-up to an old thread [1] on working with BigQuery and Arrow.
>> > I
>> > just wanted to share some work that Brian Hulette and I helped out
>> > with.
>> >
>> > I'm happy to announce there is now preliminary support for reading
>> > Arrow
>> > data in the BigQuery Storage API [1].  Python library support is
>> available
>> > in the latest release of google-cloud-bigquery-storage [2][3].
>> >
>> > Caveats:
>> > - Small cached tables are not supported (same with Avro)
>> > - Row filters aren't supported yet.
>> >
>> > Cheers,
>> > Micah
>> >
>> > [1]
>> >
>> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
>> > [2] https://cloud.google.com/bigquery/docs/reference/storage/
>> > [3] https://pypi.org/project/google-cloud-bigquery-storage/
>> > [4]
>> >
>> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
>> >
>>
>

Re: BigQuery Storage API now supports Arow

Posted by Micah Kornfield <em...@gmail.com>.
Hi David,

> I see the original thread mentioned Flight support, do you think it'd
> be possible to support Flight natively? Or conversely, maybe this
> could be a candidate for a new Flight "backend" as has been discussed.

Right now our main priority is addressing the caveats I mentioned above.
After that I would like to see if we can get some of the client side
optimizations Flight uses either into gRPC directly or perhaps specialized
clients.  Given my current backlog, I don't think these will happen anytime
soon.

In my opinion (this is really above my pay-grade) native flight support
probably hinges on two things:
1.  Customer demand for it.
2.  The Flight APIs would likely need to conform to the Cloud API design
guidelines [1].  I haven't looked closely enough to see if there are any
incompatibilities between the two.

I don't recall seeing the discussion on "new flight backends", could you
provide a pointer?  This might be a shorter path for support.  I'd also
like to make an adapter for the C++ Datasets API, but again, given my
current backlog it will take a while for this to happen.

Thanks,
Micah

[1] https://cloud.google.com/apis/design/

On Sat, Jul 27, 2019 at 6:17 AM David Li <li...@gmail.com> wrote:

> This is super awesome, thanks for sharing!
>
> I see the original thread mentioned Flight support, do you think it'd
> be possible to support Flight natively? Or conversely, maybe this
> could be a candidate for a new Flight "backend" as has been discussed.
>
> Best,
> David
>
> On 7/26/19, Micah Kornfield <em...@gmail.com> wrote:
> > Hi Arrow Dev,
> > As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> > just wanted to share some work that Brian Hulette and I helped out with.
> >
> > I'm happy to announce there is now preliminary support for reading Arrow
> > data in the BigQuery Storage API [1].  Python library support is
> available
> > in the latest release of google-cloud-bigquery-storage [2][3].
> >
> > Caveats:
> > - Small cached tables are not supported (same with Avro)
> > - Row filters aren't supported yet.
> >
> > Cheers,
> > Micah
> >
> > [1]
> >
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> > [2] https://cloud.google.com/bigquery/docs/reference/storage/
> > [3] https://pypi.org/project/google-cloud-bigquery-storage/
> > [4]
> >
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
> >
>

Re: BigQuery Storage API now supports Arow

Posted by Fan Liya <li...@gmail.com>.
@Micah Kornfield

Awesome work! Big congratulations!

Best,
Liya Fan

On Sat, Jul 27, 2019 at 9:17 PM David Li <li...@gmail.com> wrote:

> This is super awesome, thanks for sharing!
>
> I see the original thread mentioned Flight support, do you think it'd
> be possible to support Flight natively? Or conversely, maybe this
> could be a candidate for a new Flight "backend" as has been discussed.
>
> Best,
> David
>
> On 7/26/19, Micah Kornfield <em...@gmail.com> wrote:
> > Hi Arrow Dev,
> > As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> > just wanted to share some work that Brian Hulette and I helped out with.
> >
> > I'm happy to announce there is now preliminary support for reading Arrow
> > data in the BigQuery Storage API [1].  Python library support is
> available
> > in the latest release of google-cloud-bigquery-storage [2][3].
> >
> > Caveats:
> > - Small cached tables are not supported (same with Avro)
> > - Row filters aren't supported yet.
> >
> > Cheers,
> > Micah
> >
> > [1]
> >
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> > [2] https://cloud.google.com/bigquery/docs/reference/storage/
> > [3] https://pypi.org/project/google-cloud-bigquery-storage/
> > [4]
> >
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
> >
>

Re: BigQuery Storage API now supports Arow

Posted by David Li <li...@gmail.com>.
This is super awesome, thanks for sharing!

I see the original thread mentioned Flight support, do you think it'd
be possible to support Flight natively? Or conversely, maybe this
could be a candidate for a new Flight "backend" as has been discussed.

Best,
David

On 7/26/19, Micah Kornfield <em...@gmail.com> wrote:
> Hi Arrow Dev,
> As a follow-up to an old thread [1] on working with BigQuery and Arrow. I
> just wanted to share some work that Brian Hulette and I helped out with.
>
> I'm happy to announce there is now preliminary support for reading Arrow
> data in the BigQuery Storage API [1].  Python library support is available
> in the latest release of google-cloud-bigquery-storage [2][3].
>
> Caveats:
> - Small cached tables are not supported (same with Avro)
> - Row filters aren't supported yet.
>
> Cheers,
> Micah
>
> [1]
> https://lists.apache.org/thread.html/6d374dc6c948d3e84b1f0feda1d48eddf905a99c0ef569d46af7f7af@%3Cdev.arrow.apache.org%3E
> [2] https://cloud.google.com/bigquery/docs/reference/storage/
> [3] https://pypi.org/project/google-cloud-bigquery-storage/
> [4]
> https://googleapis.github.io/google-cloud-python/latest/bigquery_storage/gapic/v1beta1/reader.html#google.cloud.bigquery_storage_v1beta1.reader.ReadRowsIterable.to_arrow
>