You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2015/12/16 23:27:46 UTC

[jira] [Comment Edited] (LUCENE-6933) Create a (cleaned up) SVN history in git

    [ https://issues.apache.org/jira/browse/LUCENE-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061015#comment-15061015 ] 

Dawid Weiss edited comment on LUCENE-6933 at 12/16/15 10:27 PM:
----------------------------------------------------------------

After some more digging and experiments it seems realistic that the following multi-step process will get us the goals above.
* cat /dev/null on all jar files (and possibly other binaries) directly on the SVN dump, or use https://rtyley.github.io/bfg-repo-cleaner/ to remove/ truncate them on the git repo
* create local SVN repo with the above, preserving dummy commits so that version numbers match Apache's SVN
* use {{git-svn}} to mirror (separately) {{lucene/java/*}}, {{lucene/dev/*}} and Solr's pre-merge history.
* import those separate history trees into one git repo, use grafts and branch filtering to stitch them together.
* do any finalizing cleanups (correct commit author addresses, clean up any junk branches, tags, add actual release tags throughout the history).

I'll proceed and try to do all the above locally. If it works, I'll push a "test" repo to github so that folks can inspect. Everything takes ages. Patience.



was (Author: dweiss):
After some more digging and experiments it seems realistic that the following multi-step process will get us the goals above.
* cat /dev/null on all jar files (and possibly other binaries) directly on the SVN dump,
* create local SVN repo with the above, preserving dummy commits so that version numbers match Apache's SVN
* use {{git-svn}} to mirror (separately) {{lucene/java/*}}, {{lucene/dev/*}} and Solr's pre-merge history.
* import those separate history trees into one git repo, use grafts and branch filtering to stitch them together.
* do any finalizing cleanups (correct commit author addresses, clean up any junk branches, tags, add actual release tags throughout the history).

I'll proceed and try to do all the above locally. If it works, I'll push a "test" repo to github so that folks can inspect. Everything takes ages. Patience.


> Create a (cleaned up) SVN history in git
> ----------------------------------------
>
>                 Key: LUCENE-6933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6933
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>         Attachments: multibranch-commits.log
>
>
> Goals:
> * selectively drop projects and core-irrelevant stuff:
>   ** {{lucene/site}}
>   ** {{lucene/nutch}}
>   ** {{lucene/lucy}}
>   ** {{lucene/tika}}
>   ** {{lucene/hadoop}}
>   ** {{lucene/mahout}}
>   ** {{lucene/pylucene}}
>   ** {{lucene/lucene.net}}
>   ** {{lucene/old_versioned_docs}}
>   ** {{lucene/openrelevance}}
>   ** {{lucene/board-reports}}
>   ** {{lucene/java/site}}
>   ** {{lucene/java/nightly}}
>   ** {{lucene/dev/nightly}}
>   ** {{lucene/dev/lucene2878}}
>   ** {{lucene/sandbox/luke}}
>   ** {{lucene/solr/nightly}}
> * preserve the history of all changes to core sources (Solr and Lucene).
>   ** {{lucene/java}}
>   ** {{lucene/solr}}
>   ** {{lucene/dev/trunk}}
>   ** {{lucene/dev/branches/branch_3x}}
>   ** {{lucene/dev/branches/branch_4x}}
>   ** {{lucene/dev/branches/branch_5x}}
> * provide a way to link git commits and history with svn revisions (amend the log message).
> * annotate release tags
> * deal with large binary blobs (JARs): keep empty files instead for their historical reference only.
> Non goals:
> * no need to preserve "exact" merge history from SVN (see "impossible" below).
> * Ability to build ancient versions is not an issue.
> Impossible:
> * It is not possible to preserve SVN "merge history" because of the following reasons:
>   ** Each commit in SVN operates on individual files. So one commit can "copy" (and record a merge) files from anywhere in the object tree, even modifying them along the way. There simply is no equivalent for this in git. 
>   ** There are historical commits in SVN that apply changes to multiple branches in one commit ({{r1569975}}) and merges *from* multiple branches in one commit ({{r940806}}).
> * Because exact merge tracking is impossible then what follows is that exact "linearized" history of a given file is also impossible to record. Let's say changes X, Y and Z have been applied to a branch of a file A and then merged back. In git, this would be reflected as a single commit flattening X, Y and Z (on the target branch) and three independent commits on the branch. The "copy-from" link from one branch to another cannot be represented because, as mentioned, merges are done on entire branches in git, not on individual files. Yes, there are commits in SVN history that have selective file merges (not entire branches).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org