You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Damien Chaillou <da...@gmail.com> on 2020/05/13 00:43:41 UTC

Example for Apache Arrow Flight

Hi!

I'm currently playing with Apache Arrow Flight in java and cant get my head
around how to implement something.
In the *doGet* method, for example, I'm doing simple JDBC calls that I
would like to stream over.
If I understand correctly, the *FlightData*'s body should be a ArrowMessage
serialised as a ByteString (?) I built from a ResultSet from my JDBC call.
I though using JdbcToArrow
<https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java#L149>
helper
class to help, but I can't find any example of how to do such thing.
I came with few questions :

   - What is the data we must set in dataBody ?
   - If those are ArrowMessages, how can I map a ResultSet to this type?
   - How do we serialise to ByteString objects typed like ArrowMessage,
   Schema ... ?


Could anyone point a piece of code/blog post/anything to me please ?

My toy project would be a generic proxy server in front of any database
with available JDBC drivers that could stream queries over Arrow Flight
(gRPC).

Cheers, thanks!


Damien

Re: Example for Apache Arrow Flight

Posted by Micah Kornfield <em...@gmail.com>.
Hi Damien,
I'm not an expert but I believe  populating the bytes field should be all
the is necessary.  If you want to be sure there are existing flight
integration tests that you could substitute  your final code into [1][2].

It is worth noting that quite a bit of effort went into avoiding memory
copies when sending bytes over the wire with gRPC in Java (the logic can be
seen in ArrowMessage [3]).

To answer your other questions:

>
>    - What is the data we must set in dataBody ?
>
> I believe this is a serialized Message.  VectorUnloader handles extracting
data from VectorSchemaRoot for serialization.

>
>    - If those are ArrowMessages, how can I map a ResultSet to this type?
>
> ResultSet->VectorSchema root (via the JDBC contrib library) ->
VectorUnloader -> Bytes.  It is worth noting that I think we left the JDBC
library in a state where where it creates a new VectorSchemaRoot each
time.  If this is the case we should probably change it to reuse and
existing VectorSchemaRoots.

It might also be useful to review the Java prose documentation which is
linked from the java README [4].

Hope this helps.

Micah

[1]
https://github.com/apache/arrow/blob/0188e45bbe688c45f10032e9c37fbedab755fa71/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/integration/IntegrationTestServer.java
[2]
https://github.com/apache/arrow/blob/3567dcf3a314f009e56b4e310f5eaa0e49c6469c/dev/archery/archery/integration/tester_java.py

[3]
https://github.com/apache/arrow/blob/39cc7434479ddce565e0a0fe35f416a9cd992700/java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java
[4] https://mail.google.com/mail/u/0/#inbox/FMfcgxwHNMZSxGXMbrZDFdwvrdZfGTdF




On Tue, May 12, 2020 at 7:06 PM Damien Chaillou <da...@gmail.com>
wrote:

> Hi Andy,
> thanks for the VERY quick response and links, I will study them.
> However, I had in mind an implementation like the example you implemented
> in the Rust example folder[1], I mean implementing the FlightService trait
> from the gRPC service. This is why meant to build a FlightData, directly
> [2]. Would it mean I "just" have to transform my VectorSchemaRoot in a
> ByteString (set in the dataBody of the FlightData object) that will be
> streamed over gRPC?
>
> (I wanted to use Scala AkkaStream gRPC to implement my server, because I
> don't think *org.apache.arrow.flight.FlightServer* will fit my needs.)
>
> Or Maybe I didn't understand how a Flight Server should be implemented and
> I got it wrong?
>
> Cheers,
>
> Damien
>
> [1]
> https://github.com/andygrove/arrow/blob/master/rust/datafusion/examples/flight_server.rs
> [2] https://github.com/apache/arrow/blob/master/format/Flight.proto#L300
>
> Le mar. 12 mai 2020 à 21:58, Andy Grove <an...@gmail.com> a écrit :
>
>> Hi Damien,
>>
>> Here is a brief answer that hopefully at least points you in the right
>> direction.
>>
>> You need to use the VectorSchemaRoot class to build batches of data.
>> There is some documentation on how to do that [1]. Then, in your
>> FlightProducer implementation, you need to pass the batches to the
>> FlightProducer.ServerStreamListener using the "start" and "next" methods
>> when batches are ready to be sent. There is sample code in the Arrrow repo
>> [2] and there is a Kotlin example that I wrote here [3].
>>
>> Andy.
>>
>> [1] https://arrow.apache.org/docs/java/ipc.html
>> [2]
>> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java
>> [3]
>> https://github.com/ballista-compute/ballista/blob/master/jvm/executor/src/main/kotlin/BallistaFlightProducer.kt
>>
>> On Tue, May 12, 2020 at 6:43 PM Damien Chaillou <
>> damien.chaillou@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I'm currently playing with Apache Arrow Flight in java and cant get my
>>> head around how to implement something.
>>> In the *doGet* method, for example, I'm doing simple JDBC calls that I
>>> would like to stream over.
>>> If I understand correctly, the *FlightData*'s body should be a
>>> ArrowMessage serialised as a ByteString (?) I built from a ResultSet from
>>> my JDBC call. I though using JdbcToArrow
>>> <https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java#L149> helper
>>> class to help, but I can't find any example of how to do such thing.
>>> I came with few questions :
>>>
>>>    - What is the data we must set in dataBody ?
>>>    - If those are ArrowMessages, how can I map a ResultSet to this type?
>>>    - How do we serialise to ByteString objects typed like ArrowMessage,
>>>    Schema ... ?
>>>
>>>
>>> Could anyone point a piece of code/blog post/anything to me please ?
>>>
>>> My toy project would be a generic proxy server in front of any database
>>> with available JDBC drivers that could stream queries over Arrow Flight
>>> (gRPC).
>>>
>>> Cheers, thanks!
>>>
>>>
>>> Damien
>>>
>>

Re: Example for Apache Arrow Flight

Posted by Damien Chaillou <da...@gmail.com>.
Hi Andy,
thanks for the VERY quick response and links, I will study them.
However, I had in mind an implementation like the example you implemented
in the Rust example folder[1], I mean implementing the FlightService trait
from the gRPC service. This is why meant to build a FlightData, directly
[2]. Would it mean I "just" have to transform my VectorSchemaRoot in a
ByteString (set in the dataBody of the FlightData object) that will be
streamed over gRPC?

(I wanted to use Scala AkkaStream gRPC to implement my server, because I
don't think *org.apache.arrow.flight.FlightServer* will fit my needs.)

Or Maybe I didn't understand how a Flight Server should be implemented and
I got it wrong?

Cheers,

Damien

[1]
https://github.com/andygrove/arrow/blob/master/rust/datafusion/examples/flight_server.rs
[2] https://github.com/apache/arrow/blob/master/format/Flight.proto#L300

Le mar. 12 mai 2020 à 21:58, Andy Grove <an...@gmail.com> a écrit :

> Hi Damien,
>
> Here is a brief answer that hopefully at least points you in the right
> direction.
>
> You need to use the VectorSchemaRoot class to build batches of data. There
> is some documentation on how to do that [1]. Then, in your FlightProducer
> implementation, you need to pass the batches to the
> FlightProducer.ServerStreamListener using the "start" and "next" methods
> when batches are ready to be sent. There is sample code in the Arrrow repo
> [2] and there is a Kotlin example that I wrote here [3].
>
> Andy.
>
> [1] https://arrow.apache.org/docs/java/ipc.html
> [2]
> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java
> [3]
> https://github.com/ballista-compute/ballista/blob/master/jvm/executor/src/main/kotlin/BallistaFlightProducer.kt
>
> On Tue, May 12, 2020 at 6:43 PM Damien Chaillou <da...@gmail.com>
> wrote:
>
>> Hi!
>>
>> I'm currently playing with Apache Arrow Flight in java and cant get my
>> head around how to implement something.
>> In the *doGet* method, for example, I'm doing simple JDBC calls that I
>> would like to stream over.
>> If I understand correctly, the *FlightData*'s body should be a
>> ArrowMessage serialised as a ByteString (?) I built from a ResultSet from
>> my JDBC call. I though using JdbcToArrow
>> <https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java#L149> helper
>> class to help, but I can't find any example of how to do such thing.
>> I came with few questions :
>>
>>    - What is the data we must set in dataBody ?
>>    - If those are ArrowMessages, how can I map a ResultSet to this type?
>>    - How do we serialise to ByteString objects typed like ArrowMessage,
>>    Schema ... ?
>>
>>
>> Could anyone point a piece of code/blog post/anything to me please ?
>>
>> My toy project would be a generic proxy server in front of any database
>> with available JDBC drivers that could stream queries over Arrow Flight
>> (gRPC).
>>
>> Cheers, thanks!
>>
>>
>> Damien
>>
>

Re: Example for Apache Arrow Flight

Posted by Andy Grove <an...@gmail.com>.
Hi Damien,

Here is a brief answer that hopefully at least points you in the right
direction.

You need to use the VectorSchemaRoot class to build batches of data. There
is some documentation on how to do that [1]. Then, in your FlightProducer
implementation, you need to pass the batches to the
FlightProducer.ServerStreamListener using the "start" and "next" methods
when batches are ready to be sent. There is sample code in the Arrrow repo
[2] and there is a Kotlin example that I wrote here [3].

Andy.

[1] https://arrow.apache.org/docs/java/ipc.html
[2]
https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java
[3]
https://github.com/ballista-compute/ballista/blob/master/jvm/executor/src/main/kotlin/BallistaFlightProducer.kt

On Tue, May 12, 2020 at 6:43 PM Damien Chaillou <da...@gmail.com>
wrote:

> Hi!
>
> I'm currently playing with Apache Arrow Flight in java and cant get my
> head around how to implement something.
> In the *doGet* method, for example, I'm doing simple JDBC calls that I
> would like to stream over.
> If I understand correctly, the *FlightData*'s body should be a
> ArrowMessage serialised as a ByteString (?) I built from a ResultSet from
> my JDBC call. I though using JdbcToArrow
> <https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java#L149> helper
> class to help, but I can't find any example of how to do such thing.
> I came with few questions :
>
>    - What is the data we must set in dataBody ?
>    - If those are ArrowMessages, how can I map a ResultSet to this type?
>    - How do we serialise to ByteString objects typed like ArrowMessage,
>    Schema ... ?
>
>
> Could anyone point a piece of code/blog post/anything to me please ?
>
> My toy project would be a generic proxy server in front of any database
> with available JDBC drivers that could stream queries over Arrow Flight
> (gRPC).
>
> Cheers, thanks!
>
>
> Damien
>