You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/04/11 18:47:38 UTC

[GitHub] [arrow] paleolimbot opened a new issue, #35049: [R] Improve configure script for contributor experience

paleolimbot opened a new issue, #35049:
URL: https://github.com/apache/arrow/issues/35049

   ### Describe the enhancement requested
   
   There are a few things that I've noticed over the last few months of working with/installing Arrow and the R package on various platforms. I think some of these may have come up in previous tickets and as I go through actually trying some of these I will attempt to locate them.
   
   The main idea is that when a contributor does a git clone of Arrow and opens up the package in RStudio or VScode, the static nightly builds of Arrow built for the R package should be sufficient for a user to run `devtools:test()`. When I was debugging the "files are kept open too long on Windows" problem I used that strategy to avoid setting up a Windows Arrow build and it worked great! (Maybe this is even documented already...if it is, moving that documentation to a more prominent place may be helpful).
   
   Another common thing I've run up against is that the configure script enables `pkg-config` by default. The build of Arrow resolved by pkg-config is almost never the right one...it's usually too old or has the wrong features enabled. At the very least we need to check the version of whatever gets resolved by pkg-config and maybe we should consider disabling it by default. This is a particular issue with homebrew.
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1525677082

   > All good points! For Mac I think it's OK that we don't have a binary as long as it doesn't try to build libarrow again every single time (and as long as the build "just works", which it should since we test it on CI all the time).
   
   The autobrew script is very conservative/paranoid about ensuring success on CRAN machines, so it installs everything into a tempdir and even pulls a copy `brew` every time. But if you strip out that, I think it boils down to:
   
   ```
   brew tap autobrew/cran
   brew install --HEAD apache-arrow-static
   ```
   
   which should almost always work, and if you point it at a copy of the formula from the checkout, should be guaranteed to work (because that's what we test nightly). For that matter though, you could just use the regular homebrew formula, without the autobrew tap, as long as we can guarantee that we don't have version mismatch. So either way, doing that version check more strictly is the first step. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1523531216

   I've dug into the configure script a bit in https://github.com/apache/arrow/pull/35147 and have a few thoughts:
   
   * We should remove the special `brew` case. Seems it would only be used if you have `apache-arrow` installed via brew but don't have `pkg-config`. If you want to use the brew libraries, either install `pkg-config` or specify `ARROW_HOME=$(brew --prefix apache-arrow)`.
   * If we use pkg-config to find libraries, we should do a strict version check like we do in other build scripts (allowing x.y.z.1 to match x.y.z but disallowing x.y.z.9000) and reject using the found system libraries if they don't match, rather than just warn. Set `ARROW_USE_PKG_CONFIG=true` to use them anyway.
   
   These don't solve the developer experience from a checkout entirely, but they eliminate some guns pointed at our feet. nixlibs.R will download a nightly binary if you're on .9000, but that's just for linux, and even then there are issues: depending on how you install, it may download every time; if it doesn't download every time, we'll need to provide instruction on how to keep it up to date if you contribute again in the future. 
   
   On mac we don't host nightly libarrow binaries (bottles), so we can't just use them. The out of the box developer experience should just work, with no more brew conflicts, but it would be slow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1526001683

   Another thought I had while 🚶 🐕 : we could detect that we're in a git checkout, and if so, instead of doing autobrew, do the bundled build. And when the bundled build happens in a git checkout, we could make it build into a (gitignored) directory inside the project. Then we would add that location to our places to look for arrow already on the system, which we'd use, assuming it passes the version check. So the first time, it would build, and the next time you installed the R package, it would pick up the C++ build and not have to redo it.
   
   The downside to going farther down this direction is that we will need better tooling or at least instructions for what to do when *this* build gets out of sync, but even if the instruction is `rm -rf`, it would solve the problem of iterating on a single branch at one time. For anyone who is a repeat contributor who would run into the sync issues, we should probably be nudging them towards maintaining a proper C++ build anyway, so this is for the one-time fix use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1507431068

   When I'm going through this for real I will double check, but on my machine I have `pkg-config` on PATH, which a little too happily resolves to the homebrew install of Arrow. This was the issue Lionel had when attempting to do the dplyr PR.
   
   There is an open ticket somewhere about using "arrow" vs "arrow-dataset" as the pkg-config name, too (arrow-dataset is more likely to resolve to the feature set we need).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1524368699

   All good points! For Mac I think it's OK that we don't have a binary as long as it doesn't try to build libarrow again every single time (and as long as the build "just works", which it should since we test it on CI all the time).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1507401507

   I think the homebrew issue is separate from pkg-config. The brew check is after the pkg-config path: https://github.com/apache/arrow/blob/main/r/configure#L128 and unlike the pkg-config block above it, there is no check/warning about version mismatch. We should probably only use the homebrew library if there is a version match. Probably also ok if we wanted to make homebrew apache-arrow opt-in via environment variable.
   
   Possibly unrelated and just FYI, we do use `pkg-config` elsewhere (later) in configure to read the `.pc` files to know which other `-l` lib flags need to be added. This is used even in the bundled static build. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1526552124

   Well I've been thoroughly nerd-sniped 🤣  I'm hacking away at `configure` in https://github.com/apache/arrow/pull/35147 now, I think I will stop short of the changes above about the casual developer experience, but it should hopefully be easier to add and reason about after I'm done rewriting. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1531622565

   I left a TODO for this in https://github.com/apache/arrow/pull/35147. It would be interesting to test out what the new/casual contributor experience is now with those changes--it's possible that the situation is not so bad anymore--though I don't personally have time to push farther on it right now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1507441001

   Sure, I believe you (and I'm not going to `brew install apache-arrow` and mess my env up to find out 😹). I don't think anything has ever been removed from the `configure` script, only added, so there's lots of if-elses in there and it's possible some aren't being reached often or ever. It's ripe for a (careful) refactor, for sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #35049: [R] Improve configure script for contributor experience

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #35049:
URL: https://github.com/apache/arrow/issues/35049#issuecomment-1509122718

   Related: #35140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org