You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/05 17:59:38 UTC

[GitHub] [arrow] westonpace commented on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

westonpace commented on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832894104


   > From what I understood, Parquet is for storage and arrow for in memory querying
   
   I would leave it at "arrow is for in memory" but yes, you are correct.
   
   > are you planning to offer this on the JS side or that project is mainly for learning only?
   
   You could search the mailing list, that's probably the closest you will come to a project-wide long term plan.  However, from a cursory search, I do not see anyone actively working on this feature.  I don't know of any reason it couldn't happen.  I'm not sure what you mean by "learning only"?  There are many use cases for JS projects that don't read files from disk, for example, any browser project.
   
   Parquet would be a nice feature, especially for node-based backend servers, but it isn't a necessary feature for data analysis.  For example, there are many visualization libraries written in JS.  These libraries can just accept Arrow data from external applications via IPC and don't need to read it from disk themselves.
   
   > Similarly, it seems in the Python version one can specify partitioning options for writing multiple files which is not present in the JS version but helps when large amounts of data is involved.
   
   This is correct.  This is the "datasets" API and it is not part of the Arrow Columnar Format and at the moment I think it is limited to the implementations based on C++ (Python,R,Ruby,C/Glib).
   
   > Also there seems to be no way of reading JSON data either and provide arrow schema and load it into a table.
   
   There is no implementation that currently reads JSON (https://arrow.apache.org/docs/status.html#third-party-data-formats).  There is nothing preventing it but it has not been a high enough priority for anyone to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org