You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/03 23:30:30 UTC

[GitHub] [iceberg] mayursrivastava commented on pull request #2286: Add Arrow vectorized reader

mayursrivastava commented on pull request #2286:
URL: https://github.com/apache/iceberg/pull/2286#issuecomment-790149134


   Thanks for looking into it @rymurr 
   
   I'm looking to integrate this with our existing Apache Arrow/Flight service and some internal services which use VectorSchemaRoot and Arrow Field Vectors directly. Using ColumnBatch or a similar class will require us to somehow get access to the internal Arrow data structures (VectorSchemaRoot or Arrow Field Vector). On the lifecycle question I agree, Arrow Field Vectors or VectorSchemaRoot have a lifecycle policy (which is currently being handled by the reader) and need some careful handling by the user. So, yes, this is for advanced users. 
   
   If we want to provide more user friendly accessors in addition to the VectorSchemaRoot or Field Vectors, we can try to move the accessor classes from Spark to the iceberg-arrow module, but we will still need to provide access to Arrow internal data structures so that they can be used by advanced users.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org