You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/04 15:12:17 UTC

[GitHub] [arrow] ali-habibzadeh opened a new issue #10244: Will similar to Py Arrow ever be the ability to read parquet from disk into arrow?

ali-habibzadeh opened a new issue #10244:
URL: https://github.com/apache/arrow/issues/10244


   From what I understood, Parquet is for storage and arrow for in memory querying, are you planning to offer this on the JS side or that project is mainly for learning only? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh edited a comment on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh edited a comment on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.
   
   backend apps need to ingest (arrow is too large compared to parquet to be a storage strategy), partition, load, query and deliver and most of that is missing for this format, making it from a node.js stand point more of a introduction to arrow for learning and not ready for building serious apps. 
   
   Thank you for your detailed answers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nealrichardson closed issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

nealrichardson closed issue #10244:
URL: https://github.com/apache/arrow/issues/10244


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh edited a comment on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh edited a comment on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.
   
   backend apps need to ingest (arrow is too large compared to parquet to be a storage strategy), partition, load, query and deliver and most of that is missing for this format, making it from a node.js stand point more of a introduction to arrow for learning and not ready for building serious apps. 
   
   Of course it's ok to also have recipes using multitude of tools put together to cover application building patterns using arrow in node.js but from my research so far so such resource or eco system exists either. Including an unssucessful attempt at integrating with https://github.com/ironSource/parquetjs.
   
   Thank you for your detailed answers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh commented on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh commented on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh edited a comment on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh edited a comment on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.
   
   backend apps need to ingest (arrow is too large compared to parquet to be a storage strategy), partition, load, query and deliver and most of that is missing for this format, making it from a node.js stand point more of a introduction to arrow for learning and not ready for building serious apps. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh edited a comment on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh edited a comment on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.
   
   backend apps need to ingest (arrow is too large compared to parquet to be a storage strategy), partition, load, query and deliver and most of that is missing for this format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh edited a comment on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh edited a comment on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.
   
   backend apps need to ingest, partition, load, query and deliver and most of that is missing for this format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ali-habibzadeh edited a comment on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

ali-habibzadeh edited a comment on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832942422


   Thanks. Confirmed my thoughts. For a node.js serverless or backend application this is not an option as a query engine. It's just for browser based things.
   
   backend apps need to ingest (arrow is too large compared to parquet to be a storage strategy), partition, load, query and deliver and most of that is missing for this format, making it from a node.js stand point more of a introduction to arrow project for learning and not for building serious apps. 
   
   Of course it's ok to also have recipes using multitude of tools put together to cover application building patterns using arrow in node.js but from my research so far so such resource or eco system exists either. Including an unssucessful attempt at integrating with https://github.com/ironSource/parquetjs.
   
   Thank you for your detailed answers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #10244: Will JS similar to Py Arrow ever have the ability to read parquet from disk into arrow?

Posted by GitBox <gi...@apache.org>.

westonpace commented on issue #10244:
URL: https://github.com/apache/arrow/issues/10244#issuecomment-832894104


   > From what I understood, Parquet is for storage and arrow for in memory querying
   
   I would leave it at "arrow is for in memory" but yes, you are correct.
   
   > are you planning to offer this on the JS side or that project is mainly for learning only?
   
   You could search the mailing list, that's probably the closest you will come to a project-wide long term plan.  However, from a cursory search, I do not see anyone actively working on this feature.  I don't know of any reason it couldn't happen.  I'm not sure what you mean by "learning only"?  There are many use cases for JS projects that don't read files from disk, for example, any browser project.
   
   Parquet would be a nice feature, especially for node-based backend servers, but it isn't a necessary feature for data analysis.  For example, there are many visualization libraries written in JS.  These libraries can just accept Arrow data from external applications via IPC and don't need to read it from disk themselves.
   
   > Similarly, it seems in the Python version one can specify partitioning options for writing multiple files which is not present in the JS version but helps when large amounts of data is involved.
   
   This is correct.  This is the "datasets" API and it is not part of the Arrow Columnar Format and at the moment I think it is limited to the implementations based on C++ (Python,R,Ruby,C/Glib).
   
   > Also there seems to be no way of reading JSON data either and provide arrow schema and load it into a table.
   
   There is no implementation that currently reads JSON (https://arrow.apache.org/docs/status.html#third-party-data-formats).  There is nothing preventing it but it has not been a high enough priority for anyone to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org