You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/09/25 18:40:25 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #7652: Support compiling remaining DataFusion crates (`datafusion-core`) to WASM

alamb opened a new issue, #7652:
URL: https://github.com/apache/arrow-datafusion/issues/7652

   ### Is your feature request related to a problem or challenge?
   
   As shown by @jonmmease  in https://github.com/apache/arrow-datafusion/pull/7633, some of the datafusion crates can be compiled to WASM: 
   
   ```
   datafusion-common
   datafusion-expr
   datafusion-optimizer
   datafusion-physical-expr
   datafusion-sql
   ```
   
   The difficulty with getting the remaining DataFusion crates compiled to WASM is that they have non-optional dependencies on the [`parquet`](https://docs.rs/crate/parquet/) crate with its default features enabled. Several of the default parquet crate features require native dependencies that are not compatible with WASM, in particular the `lz4` and `zstd` features. If we can arrange our feature flags to make it possible to depend on parquet with these features disabled, then it should be possible to compile the core `datafusion` crate to WASM as well.
   
   ### Describe the solution you'd like
   
   One approach might be to disable the relevant parquet features that could not be compiled as described below.
   
   
   From https://github.com/apache/arrow-datafusion/pull/7633/files#r1335824930 between @jonmmease  and @tustvold 
   
   ```
   @tustvold do you have any thoughts about finagling the parquet crate's dependencies so it can compile, by default, on wasm? Should we perhaps change datafusion to disable the parquet default features?
   
    tustvold 
   
   IIRC it is the compression codecs that have issues with WASM, disabling these by default I think would be surprising for users. Further I'm not sure how useful parquet support would be given that only InMemory object_store is supported on WASM, although I may have some time to look into this over the next couple of days
   
    jonmmease 
   
   Yeah, I don't think we'd want DataFusion's default build to disable the default parquet features, but if we could arrange things so that depending on the datafusion core crate with default-features=false would either remove the parquet dependency all together, or disable the default parquet features, then I think we could get things at least compiling for wasm.
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support compiling remaining DataFusion crates (`datafusion-core`) to WASM [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #7652:
URL: https://github.com/apache/arrow-datafusion/issues/7652#issuecomment-1779960469

   Also, https://github.com/apache/arrow-datafusion/pull/7745 make parquet support optional in DataFusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] universalmind303 commented on issue #7652: Support compiling remaining DataFusion crates (`datafusion-core`) to WASM

Posted by "universalmind303 (via GitHub)" <gi...@apache.org>.
universalmind303 commented on issue #7652:
URL: https://github.com/apache/arrow-datafusion/issues/7652#issuecomment-1735779275

   this is also likely blocked upstream by arrow-rs which currently doesn't have a `parquet` feature flag.
   
   see https://github.com/apache/arrow-rs/issues/4764


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #7652: Support compiling remaining DataFusion crates (`datafusion-core`) to WASM

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #7652:
URL: https://github.com/apache/arrow-datafusion/issues/7652#issuecomment-1736195430

   A good first step might be to simply make parquet optional in DataFusion -- aka https://github.com/apache/arrow-datafusion/issues/7653
   
   That would allow us to validate and explore what dependencies are blocking wasm compilation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support compiling remaining DataFusion crates (`datafusion-core`) to WASM [arrow-datafusion]

Posted by "fudini (via GitHub)" <gi...@apache.org>.
fudini commented on issue #7652:
URL: https://github.com/apache/arrow-datafusion/issues/7652#issuecomment-1902131391

   I managed to compile for wasm, but I encountered a couple of problems:
   1. Stack overflow at `SessionContext::new`
   2. Use of `std::time::Instant` - this won't compile and probably needs to be hidden behind cfg
   https://github.com/apache/arrow-datafusion/compare/main...fudini:arrow-datafusion:wasm
   
   After these changes I was able to create `SessionContext` and run a simple query


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #7652: Support compiling remaining DataFusion crates (`datafusion-core`) to WASM

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #7652:
URL: https://github.com/apache/arrow-datafusion/issues/7652#issuecomment-1742062528

   https://github.com/apache/arrow-rs/pull/4884 makes parquet compile for WASM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org