You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/17 05:04:03 UTC

[GitHub] [arrow] emkornfield commented on issue #12618: [RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry, increase adoption/audience and productivity.

emkornfield commented on issue #12618:
URL: https://github.com/apache/arrow/issues/12618#issuecomment-1070329458


   Looking through this at a high-level (I think I might have already mentioned some of this on the mailing list) but here are a few comments:
   0.  I think having easy conversion from a map based Rows to a VectorSchemaRoot is valuable.  Would the intention be to have a mapping for all Arrow data types from a java object?  I think some of the existing getObject calls don't return the optimal types would the intention be to follow those mappings when possible?  
   1.  I'm hesitant create a class named Dataframe in the project just for easy conversion back and forth between tuples.  I think DataFrames come with a lot of expectations and in particular it seems like the canonical memory representation here seems to be row-based on-heap objects, I would expect an implementation to use a columnar representation (and at least use the concept of Vectors for columns even if VectorSchemaRoot isn't used).
   2.  I started a mailing list discussion on minimum Java version, but I believe we should be targetting at most JDK 11 for the time being.
   3. for conversion from strings you need to pass UTF_ENCODING to avoid brittleness in conversion.
   4.  I think trying to implement this in the pattern [Loader](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorLoader.html) and [Unloader](https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/VectorUnloader.html).   Maybe a new interface like VectorRowLoader and VectorRowUnloader?   If the goal is to interface well with flight I think this might be the most ergonomic.
   5.  This probably belongs in a new contrib module, but I think this would lower the barrier for entry, so if you are willing to contribute something I'd be willing to help review.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org