You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/12 18:53:48 UTC

[GitHub] [arrow] StefanKarpinski opened a new pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo

StefanKarpinski opened a new pull request #8448:
URL: https://github.com/apache/arrow/pull/8448


   This pull request merges a synthetic history of the Arrow.jl Julia package into the main arrow monorepo under the `julia` top-level directory. The history of Arrow.jl has been rewritten so that it appears that all development was done in this directory, retaining only a commit for each published version of the Arrow.jl package. Preserving this history (specifically the git tree objects associated with each commit) allows Julia's package manager to continue to install historical versions of Arrow.jl while having the arrow monorepo as the git repository of record going forward.
   
   I'm making this pull request on behalf of the Arrow.jl project (cc @quinnj, @ExpandingMan) as the resident git mage. Let me know if there's anything I should change about this PR to integrate better into the arrow project.
   
   ---
   
   For my own record (in case I need to do this again), here's the code I used to generate the synthetic history:
   ```jl
   using TOML
   
   data = TOML.parse("""
   ["0.1.2"]
   git-tree-sha1 = "5cab061e3fcf0d78291f9c4b3db1f58c8f5e1bc5"
   
   ["0.2.0"]
   git-tree-sha1 = "5081382c0e5c78c1849b9841b9d8941437060b48"
   
   ["0.2.1"]
   git-tree-sha1 = "ecfe11bd0874ab41b78be0ca8d0f680ba37978dc"
   
   ["0.2.2"]
   git-tree-sha1 = "c66fc3e71747c99a3e3940ade685c0d8ea66c0ae"
   
   ["0.2.3"]
   git-tree-sha1 = "d3c36842140057276f6f8348afa08f0f7dae2d1e"
   
   ["0.2.4"]
   git-tree-sha1 = "c86df6ed41b3bd192d663e5e0e7cac0d11fd4375"
   
   ["0.3.0"]
   git-tree-sha1 = "76641f71ac332cd4d3cf54b98234a0f597bd7a2f"
   """)
   
   trees = Dict(VersionNumber(k) => v["git-tree-sha1"] for (k, v) in data)
   
   ENV["GIT_AUTHOR_NAME"] = "Jacob Quinn"
   ENV["GIT_AUTHOR_EMAIL"] = "quinn.jacobd@gmail.com"
   
   let commit = "16b729db74d78ecb010efab855c9e46c8052f59e"
       for (ver, tree)  in sort!(collect(trees), by=first)
           message = """
           ARROW-10228: [Julia] Arrow.jl v$ver
   
           Co-authored-by: Michael Savastio <sa...@gmail.com>
           """
           ENV["GIT_AUTHOR_DATE"] = readchomp(`git show -s --format=%ai v$ver`)
           commit = readchomp(`git commit-tree -p $commit -m $message $tree`)
       end
       run(`git branch -f sk/synthetic $commit`)
   end
   run(`git filter-repo --force --to-subdirectory-filter julia`)
   ```
   Then I used the following `.git/config` in a clone of `arrow`:
   ```gitconfig
   [core]
   	repositoryformatversion = 0
   	filemode = true
   	bare = false
   	logallrefupdates = true
   	ignorecase = true
   	precomposeunicode = true
   [remote "origin"]
   	url = https://github.com/apache/arrow.git
   	fetch = +refs/heads/*:refs/remotes/origin/*
   [remote "StefanKarpinski"]
   	url = https://github.com/StefanKarpinski/arrow.git
   	fetch = +refs/heads/*:refs/remotes/StefanKarpinski/*
   [remote "Arrow.jl"]
   	url = ../Arrow.jl
   	fetch = +refs/heads/*:refs/remotes/Arrow.jl/*
   
   [branch "master"]
   	remote = origin
   	merge = refs/heads/master
   ```
   With that setup, you just do this in the `arrow` clone:
   ```sh
   git fetch Arrow.jl --no-tags
   git merge Arrow.jl/sk/synthetic
   ```
   Enter the merge commit message when prompted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707295383


   https://issues.apache.org/jira/browse/ARROW-10228


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] StefanKarpinski commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-708495972


   Additional changes can be made before or after the merge, but content of files in historical commits cannot be modified since that will change the tree hashes, which would makes it impossible to install the previous versions of Arrow.jl from this repo, which is the purpose of having this history in the repo. IMO, it would be easier and clearer to just make additional changes in this repo after the merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] StefanKarpinski commented on pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo

Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707302596


   Note: I think this should probably be merged rather than rebase merged, but let me know how y'all want the history to look. I can probably accommodate anything.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-718005061


   Can be closed in favor of #8547 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707350047


   @StefanKarpinski, I have a few things I want to cleanup/improve, so I can work on that over the next few days and then we can just do a new "release" in JuliaData/Arrow.jl and push that release commit here using the same script you have; does that sound reasonable? That would then include the license header changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707349027


   Looks like this PR needs the license headers prepended everywhere: maybe that can be done and then squashed into that last commit 0632ecf?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] StefanKarpinski commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-711018783


   However, those rebased commits _cannot_ modify any of the files in any way, e.g. by putting headers in them. The headers can be added in a newer commit for the first version that is published as an official part of arrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] StefanKarpinski closed pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
StefanKarpinski closed pull request #8448:
URL: https://github.com/apache/arrow/pull/8448


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-710135731


   We can have a bigger discussion (e.g. on the mailing list) but in other instances either we've done a rebase-merge or a squash-merge for these. It's our preference to maintain a linear commit history in the main branch. How important is it to be able to install the old releases using the exact tree hash at the time that they were released before? Since this code is still pre-"production" (I think? I haven't looked at the status of the integration tests) I'm not sure how valuable it is to be able to install the old releases


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] StefanKarpinski commented on pull request #8448: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-711016344


   We guarantee that published Julia package versions remain installable forever, and versions are immutably identified by tree hashes, so it's quite important. If one couldn't install old versions from this repo, then this would have to contain a new, different Julia package so that the old one could remain installable. Fortunately, a rebase should not affect the necessary subtrees, so rebasing should be fine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] StefanKarpinski commented on pull request #8448: ARROW-10228: [Julia] merge Arrow.jl history into main arrow monorepo

Posted by GitBox <gi...@apache.org>.
StefanKarpinski commented on pull request #8448:
URL: https://github.com/apache/arrow/pull/8448#issuecomment-707301689


   Supersedes https://github.com/apache/arrow/pull/8393 but still needs IP clearance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org