You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hop.apache.org by "hansva (via GitHub)" <gi...@apache.org> on 2023/02/17 13:34:24 UTC

[GitHub] [hop] hansva opened a new issue, #2421: [Feature Request]: Create new Data Type Arrow Vectors

hansva opened a new issue, #2421:
URL: https://github.com/apache/hop/issues/2421

   ### What would you like to happen?
   
   Migration: https://issues.apache.org/jira/browse/HOP-4121
   
   I'd like to add a new core data type: an array of Apache Arrow FieldVectors. This allows for integrating with files and services that utilize Arrow-based data formats. Some examples include the Feather file format, Google BigQuery Storage Read API, and Neo4j's new streaming support with its Graph Data Science library (which leverages Apache Arrow Flight).
   
   My initial proposal is to:
   
   add the core ValueMeta type ValueMetaArrowVectors
   add an ArrowEncode transformer for converting Hop row-wise data into batches of Arrow vectors
   add an ArrowDecode transformer for converting Arrow vectors to one or many Hop rows
   
   This will set a general purpose foundation for others to build plugins that can leverage Arrow, similar to how Avro support was recently merged.
   
   I have not opened a PR yet as I'm still adding support for vectors of lists/arrays. (I've currently implemented support for vectors of scalars of Hop Integers, Numbers, Strings, Booleans, Dates, and Timestamps.) For those curious, my wip fork is available in my "arrow" branch at https://github.com/neo4j-field/hop
   
   ### Issue Priority
   
   Priority: 3
   
   ### Issue Component
   
   Component: Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hop.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org