You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/09 04:36:05 UTC

[GitHub] [arrow-datafusion] wjones127 opened a new issue, #2185: ObjectStore write support

wjones127 opened a new issue, #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   We are looking at improving the filesystem / object store support in delta-rs, but it seems like it would be better to work on that inside of datafusion's data-access crate instead of doing all that work in delta-rs. delta-rs currently has file system support for local fs, gcs, s3, and adls, with just reading and write whole files. I think we'll want to add streaming reads and writes.
   
   **Describe the solution you'd like**
   
   Design and implement a streaming write interface into the `ObjectStore` trait.
   
   
   **Describe alternatives you've considered**
   
   We could do that work in delta-rs and then contribute it back here later. But it might not transfer well. For example, the current delta-rs S3 filesystem use rusoto, while the datafusion object store uses the AWS SDK.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Cheappie commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
Cheappie commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1093776147

   I wonder what quality of write support do you plan to provide ? Production ready implementation of data ingestion can be as large effort as having to create another project like Apache Kafka.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] wjones127 commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
wjones127 commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1263698537

   Yes! I'm happy to close this, and other issues can be files for any further integration work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1263675594

   This issue is a little out of date. We recently switched to a new object store crate and it appears to support writes.
   
   https://docs.rs/object_store/0.5.0/object_store/trait.ObjectStore.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] xudong963 commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
xudong963 commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1093709629

   related to https://github.com/apache/arrow-datafusion/issues/2025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Cheappie commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
Cheappie commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1094911631

   First of all I am just a stranger that evaluates datafusion query engine, I might lack some context so my point might not be valid for this case.
   
   Yes sure that make sense. From what I see writer API adds point of failure to the upstream. For example how is It going to deal with data loss in case of process crash or missing permissions for write to the s3 bucket, etc... ? ObjectStore that just performs reads cannot corrupt datasource and from my perspective that is great. I would suggest to push this cross FS implementation into Rust Arrow repository same as C++ did then implementation would be even more reusable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Cheappie commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
Cheappie commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1094912949

   First of all I am just a stranger that evaluates datafusion query engine, I might lack some context so my point might not be valid for this case.
   
   Yes sure that make sense. From what I see writer API adds point of failure to the upstream. For example how is It going to deal with data loss in case of process crash or missing permissions for write to the s3 bucket, etc... ? ObjectStore that just performs reads cannot corrupt datasource and from my perspective that is great. I would suggest to push this cross FS implementation into Rust Arrow repository same as C++ did then implementation would be even more reusable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] wjones127 commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
wjones127 commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1094096751

   > I wonder what quality of write support do you plan to provide ?
   
   Basically, I would like for `ObjectStore` to be the Rust Datafusion equivalent of [Arrow C++'s FileSystem](https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/filesystem.h) or Python's [fsspec](https://filesystem-spec.readthedocs.io/en/latest/). They provide a common interface to various object stores (S3, GCS, ADLS, HDFS, etc.) so that various projects implementing readers and writers (such as delta-rs) can simply use those filesystems instead of taking on the burden of writing and maintaining all those abstractions themselves.
   
   > Production ready implementation of data ingestion can be as large effort as having to create another project like Apache Kafka.
   
   This is just the "filesystem" interaction, so just reading and writing bytes to various places with a uniform API. Other "writer" related things like file formats (parquet / json / csv) would be out of scope. Does that make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] matthewmturner commented on issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on issue #2185:
URL: https://github.com/apache/arrow-datafusion/issues/2185#issuecomment-1094037502

   thanks, @xudong963 and @wjones127.  very happy to see this. also relates to #1777.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] wjones127 closed issue #2185: ObjectStore write support

Posted by GitBox <gi...@apache.org>.
wjones127 closed issue #2185: ObjectStore write support
URL: https://github.com/apache/arrow-datafusion/issues/2185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org