You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/12 18:53:48 UTC
[GitHub] [arrow] StefanKarpinski opened a new pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo
StefanKarpinski opened a new pull request #8448:
URL: https://github.com/apache/arrow/pull/8448
This pull request merges a synthetic history of the Arrow.jl Julia package into the main arrow monorepo under the `julia` top-level directory. The history of Arrow.jl has been rewritten so that it appears that all development was done in this directory, retaining only a commit for each published version of the Arrow.jl package. Preserving this history (specifically the git tree objects associated with each commit) allows Julia's package manager to continue to install historical versions of Arrow.jl while having the arrow monorepo as the git repository of record going forward.
I'm making this pull request on behalf of the Arrow.jl project (cc @quinnj, @ExpandingMan) as the resident git mage. Let me know if there's anything I should change about this PR to integrate better into the arrow project.
---
For my own record (in case I need to do this again), here's the code I used to generate the synthetic history:
```jl
using TOML
data = TOML.parse("""
["0.1.2"]
git-tree-sha1 = "5cab061e3fcf0d78291f9c4b3db1f58c8f5e1bc5"
["0.2.0"]
git-tree-sha1 = "5081382c0e5c78c1849b9841b9d8941437060b48"
["0.2.1"]
git-tree-sha1 = "ecfe11bd0874ab41b78be0ca8d0f680ba37978dc"
["0.2.2"]
git-tree-sha1 = "c66fc3e71747c99a3e3940ade685c0d8ea66c0ae"
["0.2.3"]
git-tree-sha1 = "d3c36842140057276f6f8348afa08f0f7dae2d1e"
["0.2.4"]
git-tree-sha1 = "c86df6ed41b3bd192d663e5e0e7cac0d11fd4375"
["0.3.0"]
git-tree-sha1 = "76641f71ac332cd4d3cf54b98234a0f597bd7a2f"
""")
trees = Dict(VersionNumber(k) => v["git-tree-sha1"] for (k, v) in data)
ENV["GIT_AUTHOR_NAME"] = "Jacob Quinn"
ENV["GIT_AUTHOR_EMAIL"] = "quinn.jacobd@gmail.com"
let commit = "16b729db74d78ecb010efab855c9e46c8052f59e"
for (ver, tree) in sort!(collect(trees), by=first)
message = """
ARROW-10228: [Julia] Arrow.jl v$ver
Co-authored-by: Michael Savastio <sa...@gmail.com>
"""
ENV["GIT_AUTHOR_DATE"] = readchomp(`git show -s --format=%ai v$ver`)
commit = readchomp(`git commit-tree -p $commit -m $message $tree`)
end
run(`git branch -f sk/synthetic $commit`)
end
run(`git filter-repo --force --to-subdirectory-filter julia`)
```
Then I used the following `.git/config` in a clone of `arrow`:
```gitconfig
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = https://github.com/apache/arrow.git
fetch = +refs/heads/*:refs/remotes/origin/*
[remote "StefanKarpinski"]
url = https://github.com/StefanKarpinski/arrow.git
fetch = +refs/heads/*:refs/remotes/StefanKarpinski/*
[remote "Arrow.jl"]
url = ../Arrow.jl
fetch = +refs/heads/*:refs/remotes/Arrow.jl/*
[branch "master"]
remote = origin
merge = refs/heads/master
```
With that setup, you just do this in the `arrow` clone:
```sh
git fetch Arrow.jl --no-tags
git merge Arrow.jl/sk/synthetic
```
Enter the merge commit message when prompted.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707295383
https://issues.apache.org/jira/browse/ARROW-10228
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] StefanKarpinski commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-708495972
Additional changes can be made before or after the merge, but content of files in historical commits cannot be modified since that will change the tree hashes, which would makes it impossible to install the previous versions of Arrow.jl from this repo, which is the purpose of having this history in the repo. IMO, it would be easier and clearer to just make additional changes in this repo after the merge.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] StefanKarpinski commented on pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo
Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707302596
Note: I think this should probably be merged rather than rebase merged, but let me know how y'all want the history to look. I can probably accommodate anything.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] quinnj commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-718005061
Can be closed in favor of #8547
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] quinnj commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707350047
@StefanKarpinski, I have a few things I want to cleanup/improve, so I can work on that over the next few days and then we can just do a new "release" in JuliaData/Arrow.jl and push that release commit here using the same script you have; does that sound reasonable? That would then include the license header changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707349027
Looks like this PR needs the license headers prepended everywhere: maybe that can be done and then squashed into that last commit 0632ecf?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] StefanKarpinski commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-711018783
However, those rebased commits _cannot_ modify any of the files in any way, e.g. by putting headers in them. The headers can be added in a newer commit for the first version that is published as an official part of arrow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] StefanKarpinski closed pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
StefanKarpinski closed pull request #8448:
URL: https://github.com/apache/arrow/pull/8448
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-710135731
We can have a bigger discussion (e.g. on the mailing list) but in other instances either we've done a rebase-merge or a squash-merge for these. It's our preference to maintain a linear commit history in the main branch. How important is it to be able to install the old releases using the exact tree hash at the time that they were released before? Since this code is still pre-"production" (I think? I haven't looked at the status of the integration tests) I'm not sure how valuable it is to be able to install the old releases
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] StefanKarpinski commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation
Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-711016344
We guarantee that published Julia package versions remain installable forever, and versions are immutably identified by tree hashes, so it's quite important. If one couldn't install old versions from this repo, then this would have to contain a new, different Julia package so that the old one could remain installable. Fortunately, a rebase should not affect the necessary subtrees, so rebasing should be fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] StefanKarpinski commented on pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo
Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707301689
Supersedes https://github.com/apache/arrow/pull/8393 but still needs IP clearance.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org