You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Johan Peltenburg - EWI <J....@tudelft.nl> on 2018/02/09 21:11:06 UTC

[Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Dear community,

In follow-up of the e-mail below, we have made public our repository that contains our framework called Fletcher: A framework to integrate FPGA accelerators with Apache Arrow.

https://github.com/johanpel/fletcher

With this framework you are able to provide an Arrow schema from which an easy-to-use hardware interface for FPGAs is generated, reaping all the benefits that Arrow already offers. On top of that it increases the programmability of any acceleration project you'd want to build on top of Arrow. During run-time, you simply pass your Arrow table to the run-time part of the framework and your hardware will be able to read from it by using row index ranges, receiving streams of data in the form of the type you've defined through the schema.

Currently there is an example project that does regular expression matching on an Arrow table with strings, running on the Amazon EC2 F1 platform. We are not sponsored by Amazon, but as anyone can launch an instance with an FPGA there, we thought it would be a good starting point to hopefully gain some interest, even if you don't have an FPGA card yourself.

FPGA accelerators can be so fast that more often than not serialization kills a relatively large part of the performance. Our measurements in this (relatively simple) example show that by using Arrow to prevent serialization, we sometimes get up to 6X improvement in performance over not using Arrow, especially if we start in languages that run on JVMs, for example. (Thanks everyone!)

We are looking forward for people with a little bit of FPGA experience to try it out and receive their thoughts, comments, etc. Please drop me an e-mail.

With kind regards,

Johan Peltenburg
Computer Engineering Lab
Delft University of Technology
________________________________________
From: Johan Peltenburg [j.w.peltenburg@tudelft.nl]
Sent: Tuesday, November 28, 2017 16:29
To: dev@arrow.apache.org
Subject: Development of an FPGA Accelerator framework around Apache Arrow

Dear community,

Over the last year we have been looking into integration of FPGA
accelerators
with big data frameworks such as Spark. Before Arrow took off, we
experienced
many issues like serialization overhead but also garbage collection issues,
as well as language interoperability issues with our low-level stack. These
are all problems that Arrow is now already solving for us in a very nice
manner.

We see a growing amount of support for infrastructure providers such as
Amazon
that offer instances with FPGA resources already. Also, we see very rapid
advancements from the hardware technology side, where soon enough
accelerators can (cache-coherently) be attached to host memory (for
example in
OpenCAPI), allowing accelerators to work in the same virtual address
space as
the host process.

We believe that a somewhat standardized format for data in-memory like
Arrow
can help us generalize big data processing in FPGAs tremendously. At the
same
time, it is known to us that FPGAs are notorious for their high
development time
and low programmability. Therefore, to alleviate some of these burdens
put upon
an accelerator developer, we are building a generalized framework around
Arrow
that abstracts away a very cumbersome aspect of FPGA design; interfacing
with
the data.

The framework takes Arrow Schemas as input, and generates a layer that
on the
one side interfaces with whatever the host platform provides to access host
memory (our initial framework will target support for AXI and OpenCAPI),
and
on the other side will interface with the user kernel.

The user can express request for access to the data in terms of row index
ranges. The generated layer will then provide data streams to the user,
which
the user may read using some kernel that they designed using high-level
synthesis (for example they could write the kernel in OpenCL). Thus,
they do
not need to go into the specifics of the Arrow in-memory format, bother
with
creating hardware constructs to deal with index buffers and validity
buffers,
interfacing with the host-side bus, implementing FIFO's, etc... anymore.
Hopefully this will be beneficial to faster deployment of FPGA accelerated
applications based on data represented in the Arrow format.

Currently the framework supports schemas of primitive data types, (nested)
lists and structs. The major challenge here was to be able to generate
hardware
structures from the many forms of schemas that users may provide, but these
challenges have been solved. We are in the process of testing the
framework in
simulation, and will soon move to a test on real FPGA systems. With a
bit of luck
we hope to initially release our framework in January.

We will fully open-source this framework and will attempt to make it as
vendor
independent as possible. Initially we hope to provide some example
applications
that demonstrate some of the benefits of using our framework in terms of
productivity and the benefits of using FPGAs for specific problems in big
data in general.

We are reaching out for your comments, questions, suggestions, etc... Please
give us your thoughts about this. Thank you in advance.

With kind regards,

Johan Peltenburg
Computer Engineering Lab
Delft University of Technology

Re: [Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Posted by Wes McKinney <we...@gmail.com>.
hi Johan,

I'm also very excited to see the possibilities of using Arrow with
FPGAs. Would you be interested in adding this project to the Powered
By page (http://arrow.apache.org/powered_by/)? If so, feel free to
submit a pull request into the site/ portion of the project.

best
Wes

On Sun, Feb 11, 2018 at 6:42 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> Dear Johan,
>
> this is an exciting use case for Arrow. Nice to hear about the benefits that Arrow brings to the world of FPGAs.
>
> Greetings
>
> Uwe
>
> On Fri, Feb 9, 2018, at 10:11 PM, Johan Peltenburg - EWI wrote:
>> Dear community,
>>
>> In follow-up of the e-mail below, we have made public our repository
>> that contains our framework called Fletcher: A framework to integrate
>> FPGA accelerators with Apache Arrow.
>>
>> https://github.com/johanpel/fletcher
>>
>> With this framework you are able to provide an Arrow schema from which
>> an easy-to-use hardware interface for FPGAs is generated, reaping all
>> the benefits that Arrow already offers. On top of that it increases the
>> programmability of any acceleration project you'd want to build on top
>> of Arrow. During run-time, you simply pass your Arrow table to the run-
>> time part of the framework and your hardware will be able to read from
>> it by using row index ranges, receiving streams of data in the form of
>> the type you've defined through the schema.
>>
>> Currently there is an example project that does regular expression
>> matching on an Arrow table with strings, running on the Amazon EC2 F1
>> platform. We are not sponsored by Amazon, but as anyone can launch an
>> instance with an FPGA there, we thought it would be a good starting
>> point to hopefully gain some interest, even if you don't have an FPGA
>> card yourself.
>>
>> FPGA accelerators can be so fast that more often than not serialization
>> kills a relatively large part of the performance. Our measurements in
>> this (relatively simple) example show that by using Arrow to prevent
>> serialization, we sometimes get up to 6X improvement in performance over
>> not using Arrow, especially if we start in languages that run on JVMs,
>> for example. (Thanks everyone!)
>>
>> We are looking forward for people with a little bit of FPGA experience
>> to try it out and receive their thoughts, comments, etc. Please drop me
>> an e-mail.
>>
>> With kind regards,
>>
>> Johan Peltenburg
>> Computer Engineering Lab
>> Delft University of Technology
>> ________________________________________
>> From: Johan Peltenburg [j.w.peltenburg@tudelft.nl]
>> Sent: Tuesday, November 28, 2017 16:29
>> To: dev@arrow.apache.org
>> Subject: Development of an FPGA Accelerator framework around Apache Arrow
>>
>> Dear community,
>>
>> Over the last year we have been looking into integration of FPGA
>> accelerators
>> with big data frameworks such as Spark. Before Arrow took off, we
>> experienced
>> many issues like serialization overhead but also garbage collection issues,
>> as well as language interoperability issues with our low-level stack. These
>> are all problems that Arrow is now already solving for us in a very nice
>> manner.
>>
>> We see a growing amount of support for infrastructure providers such as
>> Amazon
>> that offer instances with FPGA resources already. Also, we see very rapid
>> advancements from the hardware technology side, where soon enough
>> accelerators can (cache-coherently) be attached to host memory (for
>> example in
>> OpenCAPI), allowing accelerators to work in the same virtual address
>> space as
>> the host process.
>>
>> We believe that a somewhat standardized format for data in-memory like
>> Arrow
>> can help us generalize big data processing in FPGAs tremendously. At the
>> same
>> time, it is known to us that FPGAs are notorious for their high
>> development time
>> and low programmability. Therefore, to alleviate some of these burdens
>> put upon
>> an accelerator developer, we are building a generalized framework around
>> Arrow
>> that abstracts away a very cumbersome aspect of FPGA design; interfacing
>> with
>> the data.
>>
>> The framework takes Arrow Schemas as input, and generates a layer that
>> on the
>> one side interfaces with whatever the host platform provides to access host
>> memory (our initial framework will target support for AXI and OpenCAPI),
>> and
>> on the other side will interface with the user kernel.
>>
>> The user can express request for access to the data in terms of row index
>> ranges. The generated layer will then provide data streams to the user,
>> which
>> the user may read using some kernel that they designed using high-level
>> synthesis (for example they could write the kernel in OpenCL). Thus,
>> they do
>> not need to go into the specifics of the Arrow in-memory format, bother
>> with
>> creating hardware constructs to deal with index buffers and validity
>> buffers,
>> interfacing with the host-side bus, implementing FIFO's, etc... anymore.
>> Hopefully this will be beneficial to faster deployment of FPGA accelerated
>> applications based on data represented in the Arrow format.
>>
>> Currently the framework supports schemas of primitive data types, (nested)
>> lists and structs. The major challenge here was to be able to generate
>> hardware
>> structures from the many forms of schemas that users may provide, but these
>> challenges have been solved. We are in the process of testing the
>> framework in
>> simulation, and will soon move to a test on real FPGA systems. With a
>> bit of luck
>> we hope to initially release our framework in January.
>>
>> We will fully open-source this framework and will attempt to make it as
>> vendor
>> independent as possible. Initially we hope to provide some example
>> applications
>> that demonstrate some of the benefits of using our framework in terms of
>> productivity and the benefits of using FPGAs for specific problems in big
>> data in general.
>>
>> We are reaching out for your comments, questions, suggestions, etc... Please
>> give us your thoughts about this. Thank you in advance.
>>
>> With kind regards,
>>
>> Johan Peltenburg
>> Computer Engineering Lab
>> Delft University of Technology

Re: [Follow-up] Development of an FPGA Accelerator framework around Apache Arrow

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Dear Johan,

this is an exciting use case for Arrow. Nice to hear about the benefits that Arrow brings to the world of FPGAs.

Greetings

Uwe

On Fri, Feb 9, 2018, at 10:11 PM, Johan Peltenburg - EWI wrote:
> Dear community,
> 
> In follow-up of the e-mail below, we have made public our repository 
> that contains our framework called Fletcher: A framework to integrate 
> FPGA accelerators with Apache Arrow.
> 
> https://github.com/johanpel/fletcher
> 
> With this framework you are able to provide an Arrow schema from which 
> an easy-to-use hardware interface for FPGAs is generated, reaping all 
> the benefits that Arrow already offers. On top of that it increases the 
> programmability of any acceleration project you'd want to build on top 
> of Arrow. During run-time, you simply pass your Arrow table to the run-
> time part of the framework and your hardware will be able to read from 
> it by using row index ranges, receiving streams of data in the form of 
> the type you've defined through the schema.
> 
> Currently there is an example project that does regular expression 
> matching on an Arrow table with strings, running on the Amazon EC2 F1 
> platform. We are not sponsored by Amazon, but as anyone can launch an 
> instance with an FPGA there, we thought it would be a good starting 
> point to hopefully gain some interest, even if you don't have an FPGA 
> card yourself.
> 
> FPGA accelerators can be so fast that more often than not serialization 
> kills a relatively large part of the performance. Our measurements in 
> this (relatively simple) example show that by using Arrow to prevent 
> serialization, we sometimes get up to 6X improvement in performance over 
> not using Arrow, especially if we start in languages that run on JVMs, 
> for example. (Thanks everyone!)
> 
> We are looking forward for people with a little bit of FPGA experience 
> to try it out and receive their thoughts, comments, etc. Please drop me 
> an e-mail.
> 
> With kind regards,
> 
> Johan Peltenburg
> Computer Engineering Lab
> Delft University of Technology
> ________________________________________
> From: Johan Peltenburg [j.w.peltenburg@tudelft.nl]
> Sent: Tuesday, November 28, 2017 16:29
> To: dev@arrow.apache.org
> Subject: Development of an FPGA Accelerator framework around Apache Arrow
> 
> Dear community,
> 
> Over the last year we have been looking into integration of FPGA
> accelerators
> with big data frameworks such as Spark. Before Arrow took off, we
> experienced
> many issues like serialization overhead but also garbage collection issues,
> as well as language interoperability issues with our low-level stack. These
> are all problems that Arrow is now already solving for us in a very nice
> manner.
> 
> We see a growing amount of support for infrastructure providers such as
> Amazon
> that offer instances with FPGA resources already. Also, we see very rapid
> advancements from the hardware technology side, where soon enough
> accelerators can (cache-coherently) be attached to host memory (for
> example in
> OpenCAPI), allowing accelerators to work in the same virtual address
> space as
> the host process.
> 
> We believe that a somewhat standardized format for data in-memory like
> Arrow
> can help us generalize big data processing in FPGAs tremendously. At the
> same
> time, it is known to us that FPGAs are notorious for their high
> development time
> and low programmability. Therefore, to alleviate some of these burdens
> put upon
> an accelerator developer, we are building a generalized framework around
> Arrow
> that abstracts away a very cumbersome aspect of FPGA design; interfacing
> with
> the data.
> 
> The framework takes Arrow Schemas as input, and generates a layer that
> on the
> one side interfaces with whatever the host platform provides to access host
> memory (our initial framework will target support for AXI and OpenCAPI),
> and
> on the other side will interface with the user kernel.
> 
> The user can express request for access to the data in terms of row index
> ranges. The generated layer will then provide data streams to the user,
> which
> the user may read using some kernel that they designed using high-level
> synthesis (for example they could write the kernel in OpenCL). Thus,
> they do
> not need to go into the specifics of the Arrow in-memory format, bother
> with
> creating hardware constructs to deal with index buffers and validity
> buffers,
> interfacing with the host-side bus, implementing FIFO's, etc... anymore.
> Hopefully this will be beneficial to faster deployment of FPGA accelerated
> applications based on data represented in the Arrow format.
> 
> Currently the framework supports schemas of primitive data types, (nested)
> lists and structs. The major challenge here was to be able to generate
> hardware
> structures from the many forms of schemas that users may provide, but these
> challenges have been solved. We are in the process of testing the
> framework in
> simulation, and will soon move to a test on real FPGA systems. With a
> bit of luck
> we hope to initially release our framework in January.
> 
> We will fully open-source this framework and will attempt to make it as
> vendor
> independent as possible. Initially we hope to provide some example
> applications
> that demonstrate some of the benefits of using our framework in terms of
> productivity and the benefits of using FPGAs for specific problems in big
> data in general.
> 
> We are reaching out for your comments, questions, suggestions, etc... Please
> give us your thoughts about this. Thank you in advance.
> 
> With kind regards,
> 
> Johan Peltenburg
> Computer Engineering Lab
> Delft University of Technology