Migrating multiple repositories to Git
A few weeks ago I faced the challenge of migrating and merging multiple SVN and Git repositories into one single repository. The stackoverflow discussion “Merge two separate SVN repositories into a single Git repository” contains all the information required to solve this problem. This is a concise reproduction of all the bits an pieces presented in the article.
The plan is simple:
- clone the involved Git repositories
- migrate relevant SVN repositories to Git
- rewrite the repositories in case of overlaps or errors
- create new repository and add empty commit
- add remotes for all repositories
- fetch all remotes
- create a list of all commits of all repositories, sort it chronologically
- cherry-pick each commit in the list and apply it in the new repository
And here are the commands that implement the plan above. First clone and migrate Git and SVN repositories.
mkdir ~/delme cd ~/delme/ git clone ~/dev/repo1 git clone ~/dev/repo2 git svn clone svn://server:/repo3/ git svn clone svn://server:/repo4/
If the repositories have the same file or folder names a history rewrite is necessary. Assuming repo1 overlaps with other repositories, it is a good idea to put the contents of repo1 in a subfolder in the target repository. To accomplish this, the history of the master branch of repo1 is rewritten and all its contents is moved to the folder “subfolder”.
cd repo1 git filter-branch --tree-filter 'mkdir -p subfolder; find -mindepth 1 -maxdepth 1 -not -name subfolder -exec mv {} $fname subfolder \;' master
In this step, it is also possible to completely remove files from a repository. The following command removes the file “invalidfile” in “subfolder” from the repository completely.
git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch subfolder/invalidfile;' master
This can be repeated for other repositories as well if necessary or desired. In the next step, the target repository that should contain all merges is created. Remote repositories are added and fetched.
mkdir ~/newpreo cd ~/newpreo git init . git commit --allow-empty -m'Initial commit (empty)' git branch seed git checkout seed git remote add repo1 ~/delme/repo1 git remote add repo2 ~/delme/repo2 git remote add repo3 ~/delme/repo3 git remote add repo4 ~/delme/repo4 git fetch repo1 git fetch repo2 git fetch repo3 git fetch repo4
Finally, file containing lists are created for all commits from all repositories. The lists include the timestamp for each commit (seconds since 1/1/1970). The lists are then sorted and merged. The final result is stored in the file “ordered_commits”. This list is then iterated over and each entry is fed to the git cherry-pick
command.
git --no-pager log --format='%at %H' repo1/master > reco1_commits git --no-pager log --format='%at %H' repo2/master > reco2_commits git --no-pager log --format='%at %H' repo3/master > reco3_commits git --no-pager log --format='%at %H' repo4/master > reco4_commits cat *_commits | sort | cut -d' ' -f2 > ordered_commits cat ordered_commits | while read commit; do git cherry-pick $commit; done
The cherry-pick
command prompts git to apply the commit to the current branch. This results in a repository containing all commits from all 4 repositories in a chronological order. That’s all there is to it.
2 Comments
Chris, the technique seems cumbersome but it really works flawlessly. I cleaned and merged 2 SVN and 2 GIT repositories, the SVN repositories had over 4 years of history. Now everything is in one place, neat and clean.
Sorry, the comment form is closed at this time.
I would have assumed you could do some kind of octopus merge to bring them all together. But looking it up I see a merge is only possible when there’s common ancestry, so this cherry-pick/rebase method is required.