You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by ma...@markfarnan.com on 2020/08/12 09:29:08 UTC

Arrow Flight + Go, Arrow for Realtime

I'm looking at using Arrow for a realtime IoT project which includes use
cases both on server, and also for transferring /using in a Browser via
WASM,  and have a few  questions. 

 

Language in use is Go.  



Is anyone working on implementing   Arrow-Flight in Go ?      (According to
the feature matrix,  nothing ready yet, so wanted to check. 

 

Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if so,
any issues ?

 

Any pointers/documentation  on using/extending Arrow for realtime streaming
cases.   (Specifically where a DataFrame is requested, but then it needs to
'grow' as new data arrives, often at high speed).  

Not language specific, just trying to understand the right pattern for using
Arrow for this,  and couldn't' find much in the docs. 

 

Regards

 

Mark. 


Re: Arrow Flight + Go, Arrow for Realtime

Posted by Micah Kornfield <em...@gmail.com>.
Sorry to thread hijack.


> One key point is that we perform our own dictionary encoding of the data
> before generating the Arrow file, so basically all of the dimensional data
> in the Arrow file itself consists of just numbers (integers) that represent
> keys into an array of strings stored outside the Arrow file.


Just curious is there a reason why you didn't use the built-in dictionary
support in the Arrow format?

-Micah

On Mon, Aug 17, 2020 at 9:02 PM Michael Stephenson <do...@gmail.com>
wrote:

> Having spent a few solid days looking at Finos Perspective a while back, I
> think it has a lot of potential, and also a few rough edges.  Like JS
> Arrow, the documentation is sparse and experimentation is required.  It
> does handle Arrow data well with possibly some caveats.  One is, if I
> recall correctly, that it seems to lose the ability to discriminate between
> 0 and null for integer columns after about 64K rows in size where we told
> Perspective that a column was supposed to be a nullable integer.  This was
> based on Arrow files generated via JS Arrow, which may be where the problem
> lies, I can't remember, or maybe we were doing something wrong.
>
> Perspective does offer streaming support, and I think the streamed data can
> be in Arrow IPC format.  It is also WASM and has good capabilities for
> parallelism using multiple workers.  And it's quite simple to use.
>
> I think Arrow for in-browser analytics has a lot of promise in terms of
> bandwidth, performance, and low memory usage. My team at work has started
> working on an analytics project and we've been trying Arrow as a data
> format.  For analytics datasets with no high-cardinality dimensions, it's
> been really great.  Taking a simple dataset of just over 100K rows that is
> 32MB+ in array-of-objects format in JSON, we were able to get it to right
> at 2MB via Arrow (~94% compression) with no decompression and basically no
> parsing of the data needed in the browser.  Once in the browser, we wrote
> some functions for filtering, group by, simple joins, etc., and the query
> performance has been quite good (around 10-50ms per query on average with a
> 100K+ row dataset for most slice and dice types of operations, a bit worse
> when joins are in play so we try to avoid them).  At this point these are
> all just iterations over the entire dataset(s) using a for-of loop and the
> Row proxy since we had some difficulties with the scan api, so there is
> room for improvement here.
>
> One key point is that we perform our own dictionary encoding of the data
> before generating the Arrow file, so basically all of the dimensional data
> in the Arrow file itself consists of just numbers (integers) that represent
> keys into an array of strings stored outside the Arrow file.  This improved
> the size of the Arrow file by ~50%.  It also speeds up the in-browser
> queries over the data in the browser by about 300%.  In a multipart mime
> response, we send down the Arrow file along with a JSON array that serves
> as the "dictionary."  In the browser, queries are run by transforming
> strings into the numeric keys contained in the Arrow file, performing the
> query, and then only at the end when the result is small is the data
> "unpacked" back into strings using the dictionary.  The 300ish% improvement
> mentioned includes the time for this packing and unpacking.
>
> For larger datasets, we've tried processing them using web workers just for
> experimentation purposes. We tried this with over 1M rows and it worked
> nicely, only slightly noticeable lag time for the end user when running
> slice and dice operations.  For our use case, a few 100K rows or less, the
> overhead of the web workers hasn't been worth it, but it would allow
> parallel processing if needed in the future.
>
> If your dataset has high-cardinality fields, then obviously compression
> will suffer greatly, etc., but for our specific use case this approach has
> shown a lot of promise.
>
> We haven't looked at streaming yet, but we've anticipated either using
> micro batches rather than real-time or handling the streaming data outside
> of Arrow since it should incrementally represent smaller amounts of data
> (e.g., queries in the browser might query over the Arrow data and
> separately the streamed data and aggregate the results, then periodically
> maybe put the out-of-band data into Arrow format in the browser).  This
> also would lend itself to parallel query processing via web workers. We
> haven't looked at Flight as of yet, but it sounds really interesting, and
> with WASM too, even better.
>
> ~Mike
>
> On Sat, Aug 15, 2020 at 6:01 PM Pierre Belzile <pi...@gmail.com>
> wrote:
>
> > Mark,
> >
> > Dis you take a look at finos perspective? It seems to have some
> interesting
> > overlaps with your goals. I've come across it but have not digged in.
> >
> > Be curious to get your thoughts on it .
> >
> > Cheers
> >
> > On Sat., Aug. 15, 2020, 13:05 , <ma...@markfarnan.com> wrote:
> >
> > > David,
> > >
> > > Still investigating, but I suspect for streaming I may have to fall
> back
> > > to some form of "custom" Flight implementation over Websockets.
> > >
> > > Assuming Arrow/Flight actually makes sense for that link, which will
> > > probably depend on how well it compresses.   However it will be very
> nice
> > > if it does, to allow common format everywhere.
> > >
> > > The data I need to move around is highly variable in 'type',  (Arrays
> of
> > > Floats, Ints & Strings with occasional Binary, or vector (array of an
> > array
> > > of floats in my case) but the number of columns, and their type vary by
> > > dataset and visualization choices.  So far arrow seems a good choice
> > rather
> > > than any 'roll your own', and it will be nice to use same format on
> > Client
> > > side as well as in the Server system.
> > >
> > > My use case is primarily 'Get', consuming large datasets for
> > > visualization.   I doubt I'll need Put or Exchange from the browser.
> > >
> > > Mark.
> > >
> > > -----Original Message-----
> > > From: David Li <li...@gmail.com>
> > > Sent: Saturday, August 15, 2020 5:53 PM
> > > To: dev@arrow.apache.org
> > > Subject: Re: Arrow Flight + Go, Arrow for Realtime
> > >
> > > I am curious what you accomplish with Arrow + Flight from the browser.
> > > Right now, Flight is all gRPC-based, and browser compatibility is a bit
> > > mixed. I expect the various transcoders/gRPC-Web can handle
> > > GetFlightInfo/DoGet fine, though IIRC for DoGet, at least some of the
> > > transcoders would have to buffer the entire stream before sending it to
> > the
> > > browser. DoPut/DoExchange seem harder/impossible to bridge right now
> due
> > to
> > > the bidirectional streaming.
> > >
> > > Best,
> > > David
> > >
> > > On 8/14/20, mark@markfarnan.com <ma...@markfarnan.com> wrote:
> > > > Thanks Wes,
> > > >
> > > > I'll likely work on that once I get my head around Arrow in general
> > > > and confirm will use for the project.
> > > >
> > > > Considerations for how to account for the streaming append problem to
> > an
> > > > otherwise immutable dataset is current concern.   Still thinking
> > through
> > > > that.
> > > >
> > > > Regards
> > > >
> > > > Mark.
> > > >
> > > > -----Original Message-----
> > > > From: Wes McKinney <we...@gmail.com>
> > > > Sent: Wednesday, August 12, 2020 3:59 PM
> > > > To: dev <de...@arrow.apache.org>
> > > > Subject: Re: Arrow Flight + Go, Arrow for Realtime
> > > >
> > > > There's a WIP patch for Flight support in Go
> > > >
> > > > https://github.com/apache/arrow/pull/6731
> > > >
> > > > I hope to see someone taking up this work as first-class Flight
> > > > support in Go would be very useful for building data services.
> > > >
> > > > On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
> > > >>
> > > >> Arrow is mainly about batching data and leveraging all the
> > > >> opportunities this gives.
> > > >> This means you either have to buffer the data yourself and flush it
> > > >> when a reasonable sized batch is complete or play with preallocating
> > > >> Arrow structures This was discussed recently, you might be
> interested
> > > >> in the thread:
> > > >> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
> > > >>
> > > >> Note: I'm not an Arrow developer, I'm just following the "streaming"
> > > >> features of the Arrow lib, I'm interested in having a "rolling
> window"
> > > >> API (like a fixed size FIFO queue).
> > > >>
> > > >> Best regards,
> > > >> Adam Lippai
> > > >>
> > > >> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
> > > >>
> > > >> > I'm looking at using Arrow for a realtime IoT project which
> > > >> > includes use cases both on server, and also for transferring
> /using
> > > >> > in a Browser via WASM,  and have a few  questions.
> > > >> >
> > > >> >
> > > >> >
> > > >> > Language in use is Go.
> > > >> >
> > > >> >
> > > >> >
> > > >> > Is anyone working on implementing   Arrow-Flight in Go ?
> > > (According
> > > >> > to
> > > >> > the feature matrix,  nothing ready yet, so wanted to check.
> > > >> >
> > > >> >
> > > >> >
> > > >> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?
> >  if
> > > >> > so,
> > > >> > any issues ?
> > > >> >
> > > >> >
> > > >> >
> > > >> > Any pointers/documentation  on using/extending Arrow for realtime
> > > >> > streaming
> > > >> > cases.   (Specifically where a DataFrame is requested, but then it
> > > needs
> > > >> > to
> > > >> > 'grow' as new data arrives, often at high speed).
> > > >> >
> > > >> > Not language specific, just trying to understand the right pattern
> > > >> > for using Arrow for this,  and couldn't' find much in the docs.
> > > >> >
> > > >> >
> > > >> >
> > > >> > Regards
> > > >> >
> > > >> >
> > > >> >
> > > >> > Mark.
> > > >> >
> > > >> >
> > > >
> > > >
> > >
> > >
> >
>

Re: Arrow Flight + Go, Arrow for Realtime

Posted by Michael Stephenson <do...@gmail.com>.
Having spent a few solid days looking at Finos Perspective a while back, I
think it has a lot of potential, and also a few rough edges.  Like JS
Arrow, the documentation is sparse and experimentation is required.  It
does handle Arrow data well with possibly some caveats.  One is, if I
recall correctly, that it seems to lose the ability to discriminate between
0 and null for integer columns after about 64K rows in size where we told
Perspective that a column was supposed to be a nullable integer.  This was
based on Arrow files generated via JS Arrow, which may be where the problem
lies, I can't remember, or maybe we were doing something wrong.

Perspective does offer streaming support, and I think the streamed data can
be in Arrow IPC format.  It is also WASM and has good capabilities for
parallelism using multiple workers.  And it's quite simple to use.

I think Arrow for in-browser analytics has a lot of promise in terms of
bandwidth, performance, and low memory usage. My team at work has started
working on an analytics project and we've been trying Arrow as a data
format.  For analytics datasets with no high-cardinality dimensions, it's
been really great.  Taking a simple dataset of just over 100K rows that is
32MB+ in array-of-objects format in JSON, we were able to get it to right
at 2MB via Arrow (~94% compression) with no decompression and basically no
parsing of the data needed in the browser.  Once in the browser, we wrote
some functions for filtering, group by, simple joins, etc., and the query
performance has been quite good (around 10-50ms per query on average with a
100K+ row dataset for most slice and dice types of operations, a bit worse
when joins are in play so we try to avoid them).  At this point these are
all just iterations over the entire dataset(s) using a for-of loop and the
Row proxy since we had some difficulties with the scan api, so there is
room for improvement here.

One key point is that we perform our own dictionary encoding of the data
before generating the Arrow file, so basically all of the dimensional data
in the Arrow file itself consists of just numbers (integers) that represent
keys into an array of strings stored outside the Arrow file.  This improved
the size of the Arrow file by ~50%.  It also speeds up the in-browser
queries over the data in the browser by about 300%.  In a multipart mime
response, we send down the Arrow file along with a JSON array that serves
as the "dictionary."  In the browser, queries are run by transforming
strings into the numeric keys contained in the Arrow file, performing the
query, and then only at the end when the result is small is the data
"unpacked" back into strings using the dictionary.  The 300ish% improvement
mentioned includes the time for this packing and unpacking.

For larger datasets, we've tried processing them using web workers just for
experimentation purposes. We tried this with over 1M rows and it worked
nicely, only slightly noticeable lag time for the end user when running
slice and dice operations.  For our use case, a few 100K rows or less, the
overhead of the web workers hasn't been worth it, but it would allow
parallel processing if needed in the future.

If your dataset has high-cardinality fields, then obviously compression
will suffer greatly, etc., but for our specific use case this approach has
shown a lot of promise.

We haven't looked at streaming yet, but we've anticipated either using
micro batches rather than real-time or handling the streaming data outside
of Arrow since it should incrementally represent smaller amounts of data
(e.g., queries in the browser might query over the Arrow data and
separately the streamed data and aggregate the results, then periodically
maybe put the out-of-band data into Arrow format in the browser).  This
also would lend itself to parallel query processing via web workers. We
haven't looked at Flight as of yet, but it sounds really interesting, and
with WASM too, even better.

~Mike

On Sat, Aug 15, 2020 at 6:01 PM Pierre Belzile <pi...@gmail.com>
wrote:

> Mark,
>
> Dis you take a look at finos perspective? It seems to have some interesting
> overlaps with your goals. I've come across it but have not digged in.
>
> Be curious to get your thoughts on it .
>
> Cheers
>
> On Sat., Aug. 15, 2020, 13:05 , <ma...@markfarnan.com> wrote:
>
> > David,
> >
> > Still investigating, but I suspect for streaming I may have to fall back
> > to some form of "custom" Flight implementation over Websockets.
> >
> > Assuming Arrow/Flight actually makes sense for that link, which will
> > probably depend on how well it compresses.   However it will be very nice
> > if it does, to allow common format everywhere.
> >
> > The data I need to move around is highly variable in 'type',  (Arrays of
> > Floats, Ints & Strings with occasional Binary, or vector (array of an
> array
> > of floats in my case) but the number of columns, and their type vary by
> > dataset and visualization choices.  So far arrow seems a good choice
> rather
> > than any 'roll your own', and it will be nice to use same format on
> Client
> > side as well as in the Server system.
> >
> > My use case is primarily 'Get', consuming large datasets for
> > visualization.   I doubt I'll need Put or Exchange from the browser.
> >
> > Mark.
> >
> > -----Original Message-----
> > From: David Li <li...@gmail.com>
> > Sent: Saturday, August 15, 2020 5:53 PM
> > To: dev@arrow.apache.org
> > Subject: Re: Arrow Flight + Go, Arrow for Realtime
> >
> > I am curious what you accomplish with Arrow + Flight from the browser.
> > Right now, Flight is all gRPC-based, and browser compatibility is a bit
> > mixed. I expect the various transcoders/gRPC-Web can handle
> > GetFlightInfo/DoGet fine, though IIRC for DoGet, at least some of the
> > transcoders would have to buffer the entire stream before sending it to
> the
> > browser. DoPut/DoExchange seem harder/impossible to bridge right now due
> to
> > the bidirectional streaming.
> >
> > Best,
> > David
> >
> > On 8/14/20, mark@markfarnan.com <ma...@markfarnan.com> wrote:
> > > Thanks Wes,
> > >
> > > I'll likely work on that once I get my head around Arrow in general
> > > and confirm will use for the project.
> > >
> > > Considerations for how to account for the streaming append problem to
> an
> > > otherwise immutable dataset is current concern.   Still thinking
> through
> > > that.
> > >
> > > Regards
> > >
> > > Mark.
> > >
> > > -----Original Message-----
> > > From: Wes McKinney <we...@gmail.com>
> > > Sent: Wednesday, August 12, 2020 3:59 PM
> > > To: dev <de...@arrow.apache.org>
> > > Subject: Re: Arrow Flight + Go, Arrow for Realtime
> > >
> > > There's a WIP patch for Flight support in Go
> > >
> > > https://github.com/apache/arrow/pull/6731
> > >
> > > I hope to see someone taking up this work as first-class Flight
> > > support in Go would be very useful for building data services.
> > >
> > > On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
> > >>
> > >> Arrow is mainly about batching data and leveraging all the
> > >> opportunities this gives.
> > >> This means you either have to buffer the data yourself and flush it
> > >> when a reasonable sized batch is complete or play with preallocating
> > >> Arrow structures This was discussed recently, you might be interested
> > >> in the thread:
> > >> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
> > >>
> > >> Note: I'm not an Arrow developer, I'm just following the "streaming"
> > >> features of the Arrow lib, I'm interested in having a "rolling window"
> > >> API (like a fixed size FIFO queue).
> > >>
> > >> Best regards,
> > >> Adam Lippai
> > >>
> > >> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
> > >>
> > >> > I'm looking at using Arrow for a realtime IoT project which
> > >> > includes use cases both on server, and also for transferring /using
> > >> > in a Browser via WASM,  and have a few  questions.
> > >> >
> > >> >
> > >> >
> > >> > Language in use is Go.
> > >> >
> > >> >
> > >> >
> > >> > Is anyone working on implementing   Arrow-Flight in Go ?
> > (According
> > >> > to
> > >> > the feature matrix,  nothing ready yet, so wanted to check.
> > >> >
> > >> >
> > >> >
> > >> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?
>  if
> > >> > so,
> > >> > any issues ?
> > >> >
> > >> >
> > >> >
> > >> > Any pointers/documentation  on using/extending Arrow for realtime
> > >> > streaming
> > >> > cases.   (Specifically where a DataFrame is requested, but then it
> > needs
> > >> > to
> > >> > 'grow' as new data arrives, often at high speed).
> > >> >
> > >> > Not language specific, just trying to understand the right pattern
> > >> > for using Arrow for this,  and couldn't' find much in the docs.
> > >> >
> > >> >
> > >> >
> > >> > Regards
> > >> >
> > >> >
> > >> >
> > >> > Mark.
> > >> >
> > >> >
> > >
> > >
> >
> >
>

Re: Arrow Flight + Go, Arrow for Realtime

Posted by Pierre Belzile <pi...@gmail.com>.
Mark,

Dis you take a look at finos perspective? It seems to have some interesting
overlaps with your goals. I've come across it but have not digged in.

Be curious to get your thoughts on it .

Cheers

On Sat., Aug. 15, 2020, 13:05 , <ma...@markfarnan.com> wrote:

> David,
>
> Still investigating, but I suspect for streaming I may have to fall back
> to some form of "custom" Flight implementation over Websockets.
>
> Assuming Arrow/Flight actually makes sense for that link, which will
> probably depend on how well it compresses.   However it will be very nice
> if it does, to allow common format everywhere.
>
> The data I need to move around is highly variable in 'type',  (Arrays of
> Floats, Ints & Strings with occasional Binary, or vector (array of an array
> of floats in my case) but the number of columns, and their type vary by
> dataset and visualization choices.  So far arrow seems a good choice rather
> than any 'roll your own', and it will be nice to use same format on Client
> side as well as in the Server system.
>
> My use case is primarily 'Get', consuming large datasets for
> visualization.   I doubt I'll need Put or Exchange from the browser.
>
> Mark.
>
> -----Original Message-----
> From: David Li <li...@gmail.com>
> Sent: Saturday, August 15, 2020 5:53 PM
> To: dev@arrow.apache.org
> Subject: Re: Arrow Flight + Go, Arrow for Realtime
>
> I am curious what you accomplish with Arrow + Flight from the browser.
> Right now, Flight is all gRPC-based, and browser compatibility is a bit
> mixed. I expect the various transcoders/gRPC-Web can handle
> GetFlightInfo/DoGet fine, though IIRC for DoGet, at least some of the
> transcoders would have to buffer the entire stream before sending it to the
> browser. DoPut/DoExchange seem harder/impossible to bridge right now due to
> the bidirectional streaming.
>
> Best,
> David
>
> On 8/14/20, mark@markfarnan.com <ma...@markfarnan.com> wrote:
> > Thanks Wes,
> >
> > I'll likely work on that once I get my head around Arrow in general
> > and confirm will use for the project.
> >
> > Considerations for how to account for the streaming append problem to an
> > otherwise immutable dataset is current concern.   Still thinking through
> > that.
> >
> > Regards
> >
> > Mark.
> >
> > -----Original Message-----
> > From: Wes McKinney <we...@gmail.com>
> > Sent: Wednesday, August 12, 2020 3:59 PM
> > To: dev <de...@arrow.apache.org>
> > Subject: Re: Arrow Flight + Go, Arrow for Realtime
> >
> > There's a WIP patch for Flight support in Go
> >
> > https://github.com/apache/arrow/pull/6731
> >
> > I hope to see someone taking up this work as first-class Flight
> > support in Go would be very useful for building data services.
> >
> > On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
> >>
> >> Arrow is mainly about batching data and leveraging all the
> >> opportunities this gives.
> >> This means you either have to buffer the data yourself and flush it
> >> when a reasonable sized batch is complete or play with preallocating
> >> Arrow structures This was discussed recently, you might be interested
> >> in the thread:
> >> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
> >>
> >> Note: I'm not an Arrow developer, I'm just following the "streaming"
> >> features of the Arrow lib, I'm interested in having a "rolling window"
> >> API (like a fixed size FIFO queue).
> >>
> >> Best regards,
> >> Adam Lippai
> >>
> >> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
> >>
> >> > I'm looking at using Arrow for a realtime IoT project which
> >> > includes use cases both on server, and also for transferring /using
> >> > in a Browser via WASM,  and have a few  questions.
> >> >
> >> >
> >> >
> >> > Language in use is Go.
> >> >
> >> >
> >> >
> >> > Is anyone working on implementing   Arrow-Flight in Go ?
> (According
> >> > to
> >> > the feature matrix,  nothing ready yet, so wanted to check.
> >> >
> >> >
> >> >
> >> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if
> >> > so,
> >> > any issues ?
> >> >
> >> >
> >> >
> >> > Any pointers/documentation  on using/extending Arrow for realtime
> >> > streaming
> >> > cases.   (Specifically where a DataFrame is requested, but then it
> needs
> >> > to
> >> > 'grow' as new data arrives, often at high speed).
> >> >
> >> > Not language specific, just trying to understand the right pattern
> >> > for using Arrow for this,  and couldn't' find much in the docs.
> >> >
> >> >
> >> >
> >> > Regards
> >> >
> >> >
> >> >
> >> > Mark.
> >> >
> >> >
> >
> >
>
>

RE: Arrow Flight + Go, Arrow for Realtime

Posted by ma...@markfarnan.com.
David, 

Still investigating, but I suspect for streaming I may have to fall back to some form of "custom" Flight implementation over Websockets. 

Assuming Arrow/Flight actually makes sense for that link, which will probably depend on how well it compresses.   However it will be very nice if it does, to allow common format everywhere.  

The data I need to move around is highly variable in 'type',  (Arrays of Floats, Ints & Strings with occasional Binary, or vector (array of an array of floats in my case) but the number of columns, and their type vary by dataset and visualization choices.  So far arrow seems a good choice rather than any 'roll your own', and it will be nice to use same format on Client side as well as in the Server system.

My use case is primarily 'Get', consuming large datasets for visualization.   I doubt I'll need Put or Exchange from the browser.

Mark. 

-----Original Message-----
From: David Li <li...@gmail.com> 
Sent: Saturday, August 15, 2020 5:53 PM
To: dev@arrow.apache.org
Subject: Re: Arrow Flight + Go, Arrow for Realtime

I am curious what you accomplish with Arrow + Flight from the browser.
Right now, Flight is all gRPC-based, and browser compatibility is a bit mixed. I expect the various transcoders/gRPC-Web can handle GetFlightInfo/DoGet fine, though IIRC for DoGet, at least some of the transcoders would have to buffer the entire stream before sending it to the browser. DoPut/DoExchange seem harder/impossible to bridge right now due to the bidirectional streaming.

Best,
David

On 8/14/20, mark@markfarnan.com <ma...@markfarnan.com> wrote:
> Thanks Wes,
>
> I'll likely work on that once I get my head around Arrow in general 
> and confirm will use for the project.
>
> Considerations for how to account for the streaming append problem to an
> otherwise immutable dataset is current concern.   Still thinking through
> that.
>
> Regards
>
> Mark.
>
> -----Original Message-----
> From: Wes McKinney <we...@gmail.com>
> Sent: Wednesday, August 12, 2020 3:59 PM
> To: dev <de...@arrow.apache.org>
> Subject: Re: Arrow Flight + Go, Arrow for Realtime
>
> There's a WIP patch for Flight support in Go
>
> https://github.com/apache/arrow/pull/6731
>
> I hope to see someone taking up this work as first-class Flight 
> support in Go would be very useful for building data services.
>
> On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
>>
>> Arrow is mainly about batching data and leveraging all the 
>> opportunities this gives.
>> This means you either have to buffer the data yourself and flush it 
>> when a reasonable sized batch is complete or play with preallocating 
>> Arrow structures This was discussed recently, you might be interested 
>> in the thread:
>> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
>>
>> Note: I'm not an Arrow developer, I'm just following the "streaming"
>> features of the Arrow lib, I'm interested in having a "rolling window"
>> API (like a fixed size FIFO queue).
>>
>> Best regards,
>> Adam Lippai
>>
>> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
>>
>> > I'm looking at using Arrow for a realtime IoT project which 
>> > includes use cases both on server, and also for transferring /using 
>> > in a Browser via WASM,  and have a few  questions.
>> >
>> >
>> >
>> > Language in use is Go.
>> >
>> >
>> >
>> > Is anyone working on implementing   Arrow-Flight in Go ?      (According
>> > to
>> > the feature matrix,  nothing ready yet, so wanted to check.
>> >
>> >
>> >
>> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if
>> > so,
>> > any issues ?
>> >
>> >
>> >
>> > Any pointers/documentation  on using/extending Arrow for realtime 
>> > streaming
>> > cases.   (Specifically where a DataFrame is requested, but then it needs
>> > to
>> > 'grow' as new data arrives, often at high speed).
>> >
>> > Not language specific, just trying to understand the right pattern 
>> > for using Arrow for this,  and couldn't' find much in the docs.
>> >
>> >
>> >
>> > Regards
>> >
>> >
>> >
>> > Mark.
>> >
>> >
>
>


Re: Arrow Flight + Go, Arrow for Realtime

Posted by David Li <li...@gmail.com>.
I am curious what you accomplish with Arrow + Flight from the browser.
Right now, Flight is all gRPC-based, and browser compatibility is a
bit mixed. I expect the various transcoders/gRPC-Web can handle
GetFlightInfo/DoGet fine, though IIRC for DoGet, at least some of the
transcoders would have to buffer the entire stream before sending it
to the browser. DoPut/DoExchange seem harder/impossible to bridge
right now due to the bidirectional streaming.

Best,
David

On 8/14/20, mark@markfarnan.com <ma...@markfarnan.com> wrote:
> Thanks Wes,
>
> I'll likely work on that once I get my head around Arrow in general and
> confirm will use for the project.
>
> Considerations for how to account for the streaming append problem to an
> otherwise immutable dataset is current concern.   Still thinking through
> that.
>
> Regards
>
> Mark.
>
> -----Original Message-----
> From: Wes McKinney <we...@gmail.com>
> Sent: Wednesday, August 12, 2020 3:59 PM
> To: dev <de...@arrow.apache.org>
> Subject: Re: Arrow Flight + Go, Arrow for Realtime
>
> There's a WIP patch for Flight support in Go
>
> https://github.com/apache/arrow/pull/6731
>
> I hope to see someone taking up this work as first-class Flight support in
> Go would be very useful for building data services.
>
> On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
>>
>> Arrow is mainly about batching data and leveraging all the
>> opportunities this gives.
>> This means you either have to buffer the data yourself and flush it
>> when a reasonable sized batch is complete or play with preallocating
>> Arrow structures This was discussed recently, you might be interested
>> in the thread:
>> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
>>
>> Note: I'm not an Arrow developer, I'm just following the "streaming"
>> features of the Arrow lib, I'm interested in having a "rolling window"
>> API (like a fixed size FIFO queue).
>>
>> Best regards,
>> Adam Lippai
>>
>> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
>>
>> > I'm looking at using Arrow for a realtime IoT project which includes
>> > use cases both on server, and also for transferring /using in a
>> > Browser via WASM,  and have a few  questions.
>> >
>> >
>> >
>> > Language in use is Go.
>> >
>> >
>> >
>> > Is anyone working on implementing   Arrow-Flight in Go ?      (According
>> > to
>> > the feature matrix,  nothing ready yet, so wanted to check.
>> >
>> >
>> >
>> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if
>> > so,
>> > any issues ?
>> >
>> >
>> >
>> > Any pointers/documentation  on using/extending Arrow for realtime
>> > streaming
>> > cases.   (Specifically where a DataFrame is requested, but then it needs
>> > to
>> > 'grow' as new data arrives, often at high speed).
>> >
>> > Not language specific, just trying to understand the right pattern
>> > for using Arrow for this,  and couldn't' find much in the docs.
>> >
>> >
>> >
>> > Regards
>> >
>> >
>> >
>> > Mark.
>> >
>> >
>
>

RE: Arrow Flight + Go, Arrow for Realtime

Posted by ma...@markfarnan.com.
Thanks Wes, 

I'll likely work on that once I get my head around Arrow in general and confirm will use for the project. 

Considerations for how to account for the streaming append problem to an otherwise immutable dataset is current concern.   Still thinking through that. 

Regards

Mark.

-----Original Message-----
From: Wes McKinney <we...@gmail.com> 
Sent: Wednesday, August 12, 2020 3:59 PM
To: dev <de...@arrow.apache.org>
Subject: Re: Arrow Flight + Go, Arrow for Realtime

There's a WIP patch for Flight support in Go

https://github.com/apache/arrow/pull/6731

I hope to see someone taking up this work as first-class Flight support in Go would be very useful for building data services.

On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
>
> Arrow is mainly about batching data and leveraging all the 
> opportunities this gives.
> This means you either have to buffer the data yourself and flush it 
> when a reasonable sized batch is complete or play with preallocating 
> Arrow structures This was discussed recently, you might be interested 
> in the thread:
> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
>
> Note: I'm not an Arrow developer, I'm just following the "streaming"
> features of the Arrow lib, I'm interested in having a "rolling window" 
> API (like a fixed size FIFO queue).
>
> Best regards,
> Adam Lippai
>
> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
>
> > I'm looking at using Arrow for a realtime IoT project which includes 
> > use cases both on server, and also for transferring /using in a 
> > Browser via WASM,  and have a few  questions.
> >
> >
> >
> > Language in use is Go.
> >
> >
> >
> > Is anyone working on implementing   Arrow-Flight in Go ?      (According to
> > the feature matrix,  nothing ready yet, so wanted to check.
> >
> >
> >
> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if so,
> > any issues ?
> >
> >
> >
> > Any pointers/documentation  on using/extending Arrow for realtime streaming
> > cases.   (Specifically where a DataFrame is requested, but then it needs to
> > 'grow' as new data arrives, often at high speed).
> >
> > Not language specific, just trying to understand the right pattern 
> > for using Arrow for this,  and couldn't' find much in the docs.
> >
> >
> >
> > Regards
> >
> >
> >
> > Mark.
> >
> >


Re: Arrow Flight + Go, Arrow for Realtime

Posted by Wes McKinney <we...@gmail.com>.
There's a WIP patch for Flight support in Go

https://github.com/apache/arrow/pull/6731

I hope to see someone taking up this work as first-class Flight
support in Go would be very useful for building data services.

On Wed, Aug 12, 2020 at 5:08 AM Adam Lippai <ad...@rigo.sk> wrote:
>
> Arrow is mainly about batching data and leveraging all the opportunities
> this gives.
> This means you either have to buffer the data yourself and flush it when a
> reasonable sized batch is complete or play with preallocating Arrow
> structures
> This was discussed recently, you might be interested in the thread:
> https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html
>
> Note: I'm not an Arrow developer, I'm just following the "streaming"
> features of the Arrow lib, I'm interested in having a "rolling window" API
> (like a fixed size FIFO queue).
>
> Best regards,
> Adam Lippai
>
> On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:
>
> > I'm looking at using Arrow for a realtime IoT project which includes use
> > cases both on server, and also for transferring /using in a Browser via
> > WASM,  and have a few  questions.
> >
> >
> >
> > Language in use is Go.
> >
> >
> >
> > Is anyone working on implementing   Arrow-Flight in Go ?      (According to
> > the feature matrix,  nothing ready yet, so wanted to check.
> >
> >
> >
> > Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if so,
> > any issues ?
> >
> >
> >
> > Any pointers/documentation  on using/extending Arrow for realtime streaming
> > cases.   (Specifically where a DataFrame is requested, but then it needs to
> > 'grow' as new data arrives, often at high speed).
> >
> > Not language specific, just trying to understand the right pattern for
> > using
> > Arrow for this,  and couldn't' find much in the docs.
> >
> >
> >
> > Regards
> >
> >
> >
> > Mark.
> >
> >

Re: Arrow Flight + Go, Arrow for Realtime

Posted by Adam Lippai <ad...@rigo.sk>.
Arrow is mainly about batching data and leveraging all the opportunities
this gives.
This means you either have to buffer the data yourself and flush it when a
reasonable sized batch is complete or play with preallocating Arrow
structures
This was discussed recently, you might be interested in the thread:
https://www.mail-archive.com/dev@arrow.apache.org/msg19862.html

Note: I'm not an Arrow developer, I'm just following the "streaming"
features of the Arrow lib, I'm interested in having a "rolling window" API
(like a fixed size FIFO queue).

Best regards,
Adam Lippai

On Wed, Aug 12, 2020 at 11:29 AM <ma...@markfarnan.com> wrote:

> I'm looking at using Arrow for a realtime IoT project which includes use
> cases both on server, and also for transferring /using in a Browser via
> WASM,  and have a few  questions.
>
>
>
> Language in use is Go.
>
>
>
> Is anyone working on implementing   Arrow-Flight in Go ?      (According to
> the feature matrix,  nothing ready yet, so wanted to check.
>
>
>
> Has anyone tried using Apache Arrow in  Go WASM  (Webassembly) ?   if so,
> any issues ?
>
>
>
> Any pointers/documentation  on using/extending Arrow for realtime streaming
> cases.   (Specifically where a DataFrame is requested, but then it needs to
> 'grow' as new data arrives, often at high speed).
>
> Not language specific, just trying to understand the right pattern for
> using
> Arrow for this,  and couldn't' find much in the docs.
>
>
>
> Regards
>
>
>
> Mark.
>
>

Re: Arrow Flight + Go, Arrow for Realtime

Posted by fredgan <ga...@vip.qq.com>.
Hi Sebastien,
I am really interested in this feature. I would like to join in the development. But I have to take time to familiarize myself with it. I will try it!
Fred





------------------ Original ------------------
From: Sebastien Binet <work@sbinet.org&gt;
Date: Wed,Aug 12,2020 10:47 PM
To: dev@arrow.apache.org <dev@arrow.apache.org&gt;
Subject: Re: Arrow Flight + Go,  Arrow for Realtime



Mark,

AFAIK, nobody's actively working on Arrow-Flight for Go (I think somebody started
that work at some point but I don't remember anything hitting the main repo)

as for Go+WASM:

https://lists.apache.org/thread.html/e15dc80debf9dea1b33581fa6ba95fd84b57c0ccd0162505d5d25079%40%3Cdev.arrow.apache.org%3E

ie:
===
I've just tried compiling this example:
-&nbsp; https://godoc.org/github.com/apache/arrow/go/arrow#example-package--Table
to wasm.
compilation went fine:

$&gt; GOOS=js GOARCH=wasm go build -o foo.wasm foo.go
$&gt; go-wasm ./foo.wasm
rec[0]["f1-i32"]: [1 2 3 4 5]
rec[0]["f2-f64"]: [1 2 3 4 5]
rec[1]["f1-i32"]: [6 7 8 (null) 10]
rec[1]["f2-f64"]: [6 7 8 9 10]
rec[2]["f1-i32"]: [11 12 13 14 15]
rec[2]["f2-f64"]: [11 12 13 14 15]
rec[3]["f1-i32"]: [16 17 18 19 20]
rec[3]["f2-f64"]: [16 17 18 19 20]

and it ran fine once this patch was added:
- https://github.com/apache/arrow/pull/3707

hth,
-s

PS: go-wasm is an alias of mine for this file:
https://github.com/golang/go/blob/master/misc/wasm/go_js_wasm_exec
===


hth,
-s



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, August 12, 2020 11:29 AM, <mark@markfarnan.com&gt; wrote:

&gt; I'm looking at using Arrow for a realtime IoT project which includes use
&gt; cases both on server, and also for transferring /using in a Browser via
&gt; WASM, and have a few questions.
&gt;
&gt; Language in use is Go.
&gt;
&gt; Is anyone working on implementing Arrow-Flight in Go ? (According to
&gt; the feature matrix, nothing ready yet, so wanted to check.
&gt;
&gt; Has anyone tried using Apache Arrow in Go WASM (Webassembly) ? if so,
&gt; any issues ?
&gt;
&gt; Any pointers/documentation on using/extending Arrow for realtime streaming
&gt; cases. (Specifically where a DataFrame is requested, but then it needs to
&gt; 'grow' as new data arrives, often at high speed).
&gt;
&gt; Not language specific, just trying to understand the right pattern for using
&gt; Arrow for this, and couldn't' find much in the docs.
&gt;
&gt; Regards
&gt;
&gt; Mark.

RE: Arrow Flight + Go, Arrow for Realtime

Posted by ma...@markfarnan.com.
Thanks Wes & Sebastien, 

I've tested Arrow in Go-WASM now and it is working fine.   Still getting my head around best way to use it for our Use case (IoT Data)

My goal here is to hit a Flight endpoint from the Browser  (GO-WASM specifically), and pull (all or part of)  an Arrow dataset on the server, into the Browser for visualization and local analysis.

  One of the issues I will contend with is that visualization can 'walk backwards'  in a larger dataset,  (Scrolling up),  not just forwards like an analytic generally does.   

Second goal is to update this visualization from a realtime stream, which can be as fast as 1 sample per second. 

I'm wondering if a use (abuse ?)  of batches and pre-allocation might work for streaming updates.  


Note:  It may be that using arrow like this for visualization is not appropriate, but I think it would be great if it can.

Regards

Mark.

-----Original Message-----
From: Sebastien Binet <wo...@sbinet.org> 
Sent: Wednesday, August 12, 2020 1:53 PM
To: dev@arrow.apache.org
Subject: Re: Arrow Flight + Go, Arrow for Realtime

Mark,

AFAIK, nobody's actively working on Arrow-Flight for Go (I think somebody started that work at some point but I don't remember anything hitting the main repo)

as for Go+WASM:

https://lists.apache.org/thread.html/e15dc80debf9dea1b33581fa6ba95fd84b57c0ccd0162505d5d25079%40%3Cdev.arrow.apache.org%3E

ie:
===
I've just tried compiling this example:
-  https://godoc.org/github.com/apache/arrow/go/arrow#example-package--Table
to wasm.
compilation went fine:

$> GOOS=js GOARCH=wasm go build -o foo.wasm foo.go $> go-wasm ./foo.wasm
rec[0]["f1-i32"]: [1 2 3 4 5]
rec[0]["f2-f64"]: [1 2 3 4 5]
rec[1]["f1-i32"]: [6 7 8 (null) 10]
rec[1]["f2-f64"]: [6 7 8 9 10]
rec[2]["f1-i32"]: [11 12 13 14 15]
rec[2]["f2-f64"]: [11 12 13 14 15]
rec[3]["f1-i32"]: [16 17 18 19 20]
rec[3]["f2-f64"]: [16 17 18 19 20]

and it ran fine once this patch was added:
- https://github.com/apache/arrow/pull/3707

hth,
-s

PS: go-wasm is an alias of mine for this file:
https://github.com/golang/go/blob/master/misc/wasm/go_js_wasm_exec
===


hth,
-s



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, August 12, 2020 11:29 AM, <ma...@markfarnan.com> wrote:

> I'm looking at using Arrow for a realtime IoT project which includes 
> use cases both on server, and also for transferring /using in a 
> Browser via WASM, and have a few questions.
>
> Language in use is Go.
>
> Is anyone working on implementing Arrow-Flight in Go ? (According to 
> the feature matrix, nothing ready yet, so wanted to check.
>
> Has anyone tried using Apache Arrow in Go WASM (Webassembly) ? if so, 
> any issues ?
>
> Any pointers/documentation on using/extending Arrow for realtime 
> streaming cases. (Specifically where a DataFrame is requested, but 
> then it needs to 'grow' as new data arrives, often at high speed).
>
> Not language specific, just trying to understand the right pattern for 
> using Arrow for this, and couldn't' find much in the docs.
>
> Regards
>
> Mark.




Re: Arrow Flight + Go, Arrow for Realtime

Posted by Sebastien Binet <wo...@sbinet.org>.
Mark,

AFAIK, nobody's actively working on Arrow-Flight for Go (I think somebody started
that work at some point but I don't remember anything hitting the main repo)

as for Go+WASM:

https://lists.apache.org/thread.html/e15dc80debf9dea1b33581fa6ba95fd84b57c0ccd0162505d5d25079%40%3Cdev.arrow.apache.org%3E

ie:
===
I've just tried compiling this example:
-  https://godoc.org/github.com/apache/arrow/go/arrow#example-package--Table
to wasm.
compilation went fine:

$> GOOS=js GOARCH=wasm go build -o foo.wasm foo.go
$> go-wasm ./foo.wasm
rec[0]["f1-i32"]: [1 2 3 4 5]
rec[0]["f2-f64"]: [1 2 3 4 5]
rec[1]["f1-i32"]: [6 7 8 (null) 10]
rec[1]["f2-f64"]: [6 7 8 9 10]
rec[2]["f1-i32"]: [11 12 13 14 15]
rec[2]["f2-f64"]: [11 12 13 14 15]
rec[3]["f1-i32"]: [16 17 18 19 20]
rec[3]["f2-f64"]: [16 17 18 19 20]

and it ran fine once this patch was added:
- https://github.com/apache/arrow/pull/3707

hth,
-s

PS: go-wasm is an alias of mine for this file:
https://github.com/golang/go/blob/master/misc/wasm/go_js_wasm_exec
===


hth,
-s



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, August 12, 2020 11:29 AM, <ma...@markfarnan.com> wrote:

> I'm looking at using Arrow for a realtime IoT project which includes use
> cases both on server, and also for transferring /using in a Browser via
> WASM, and have a few questions.
>
> Language in use is Go.
>
> Is anyone working on implementing Arrow-Flight in Go ? (According to
> the feature matrix, nothing ready yet, so wanted to check.
>
> Has anyone tried using Apache Arrow in Go WASM (Webassembly) ? if so,
> any issues ?
>
> Any pointers/documentation on using/extending Arrow for realtime streaming
> cases. (Specifically where a DataFrame is requested, but then it needs to
> 'grow' as new data arrives, often at high speed).
>
> Not language specific, just trying to understand the right pattern for using
> Arrow for this, and couldn't' find much in the docs.
>
> Regards
>
> Mark.