You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/26 10:01:11 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue #80: Use released version of arrow

tustvold opened a new issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80


   Currently the Datafusion crates pin a version of arrow by git revision. Unfortunately this makes it complicated to use Datafusion with other libraries that also depend on arrow, making it is necessary to update all the version pins in lock-step across potentially multiple repositories. #39 helps somewhat with this, but doesn't help with crates that Datafusion isn't dependent on that themselves depend on arrow, or if you want to configure different feature flags from what Datafusion sets.
   
   Unfortunately there is an outstanding [bug /feature omission](https://github.com/rust-lang/cargo/issues/5478) with cargo that makes patch not work for overriding git version pins, and the workarounds on the ticket no longer seem to work.
   
   If Datafusion instead depended instead on a released version of arrow, Cargo's semver compatibility rules would avoid a lot of this pain for most users, and would allow users to use the patch syntax to override the arrow version in use for all their dependencies within their workspace, if they want to use an unreleased arrow version.
   
   I'm not sure how feasible any of this is, but thought I would raise a ticket to maybe kickstart a discussion on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-871821316


   @alamb this can be closed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-826823381


   @tustvold Does https://github.com/apache/arrow-datafusion/pull/39 help with this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
alamb closed issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-872127685


   Indeed -- thanks @Jimexist  -- this issue was closed in https://github.com/apache/arrow-datafusion/issues/393 I think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-826930734


   > but possibly somewhat tangential to making DataFusion as a library easier to consume
   
   I agree.
   
   My point is that the reason we use pinned hashes of arrow is so that we do not have to wait for a new release. So, I _think_  that to stop pinning in DataFusion, we need to release arrow more frequently. But I agree that from the consumers' point of view, it is not needed, as you can just point to a hash in `arrow-rs` ^_^
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-826900252


   I think that the general problem is that we pin arrow (and many others) in datafusion; datafusion is a library and it should thus avoid pinning dependencies.
   
   Instead, it should bracket them, via e.g. `^3.0.0`, so that consumers of the library may use a different version of any of its dependencies, for as long as they are compatible, and have cargo find a valid dependency version between what the consumer wants and what datafusion requires.
   
   As it stands, consumers must use the exact same version of arrow that datafusion uses or cargo will pick two different arrow versions. This happens because Cargo cannot guarantee that the two different arrow versions (what datafusion demands and what the consumer wants) are ABI compatible. Consumers can't pass structs from a version of arrow (that they use) to another version of arrow (that datafusion uses).
   
   Note that in this context a different feature set corresponds to a different version, as cargo has no way of knowing whether a feature will retain ABI compatibility.
   
   So, I think the ask here is:
   
   1. datafusion writes something like `^4.0.0` on its `cargo.toml`
   2. we release arrow more often and strictly according to semver
   
   Is this the idea, @tustvold ? 
   
   I am not sure whether we could get away with the [multiple paths](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#multiple-locations) approach, e.g.
   
   ```
   arrow = { git = "https://github.com/arrow-rs/arrow", version = "^3.0.0" }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-826861064


   > @tustvold Does #39 help with this?
   
   It helps, but doesn't solve the underlying issue. If you depend on another crate that in turn depends on arrow and isn't exposed by DataFusion, e.g. arrow-flight, or you want to set different features from what DataFusion sets, you end up having to replicate the exact version pins from DataFusion into all other crates


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #80: Use released version of arrow

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #80:
URL: https://github.com/apache/arrow-datafusion/issues/80#issuecomment-826914705


   > datafusion writes something like ^4.0.0 on its cargo.toml
   
   Yes, this would be my ideal for the reasons you articulated.
   
   > we release arrow more often and strictly according to semver
   
   This would definitely be something I'd be supportive of, but possibly somewhat tangential to making DataFusion as a library easier to consume. Cargo has a good story for overriding dependencies within a workspace, including indirect dependencies, provided those versions aren't pinned within the libraries. Therefore if DataFusion were to move to a released version of arrow it wouldn't preclude users from opting-in to newer, potentially unreleased versions of arrow. 
   
   However, DataFusion itself would not be able to opt-in to unreleased arrow functionality, and so if there are frequent DataFusion changes coupled with arrow changes, then yes a more frequent arrow release cycle would possibly be a pre-condition of moving to using a released version of arrow.
   
   > I am not sure whether we could get away with the multiple paths approach, e.g.
   
   I've not come across this approach, I'd worry that it might be vulnerable to https://github.com/rust-lang/cargo/issues/5478 which would prevent users from opting into newer versions of arrow within their workspaces, which imo would be unfortunate
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org