Published on 2023-12-24, 1676 words
How I combined a few “junk repos” into my JunkDrawer monorepo
with git merge --allow-unrelated-histories
.
As a side effect of years of learning - new languages and frameworks, and new ideas - I’ve accumulated many dubiously-useful side projects and experiments. Some of these I’ve uploaded to Github as standalone repos, such as:
Proliferation of junk repos isn’t really a problem. Github’s free tier doesn’t place any limit on the number of public repos. Things were just starting to feel cluttered, so I wanted to consolidate these tiny projects into one place.
The junk drawer is a practical organizational technique: an escape-hatch when something doesn’t fit into a specific category. When you are trying to organize your desk or toolbox or kitchen, you can define some clear categories: silverware, knives, cooking utensils, baking, and so on. Every item belongs to one and only one category, and so it is placed in the corresponding drawer.
But there’s an inevitable remnant: toothpicks, meat thermometer, paper and pen, and so on. Perhaps their utility warrants a position in some kitchen drawer, but they don’t fit neatly into the Pure Kitchen Hierarchy you planned. But, you can still give them a category: they belong in the Junk Drawer.
I want to put all these individual repos into a monorepo. I’m calling it JunkDrawer:
$ mkdir $HOME/JunkDrawer && cd $HOME/JunkDrawer
$ git init
Now, I could just copy all my other projects directly…
$ cp -r $HOME/Code/* .
…but I’d lose the git history in each project if I do that. This whole exercise is about hoarding dubiously-useful code - so why not also preserve the dubiously-useful history that goes with it?
Git-merge is documented as a command to “join two or more development histories together”. In this context, a “history” is synonymous with a branch. The most common day-to-day usage for git-merge is to join two local branches together; for example, to join newer commits from a feature branch into a main branch.
As another day-to-day example, Git-pull is actually a compound operation combining a Git-fetch (to retrieve data from a remote branch) and a Git-merge (to join the changes in that remote branch with the local branch).
The Git-merge documentation has ASCII diagrams to illustrate a typical merge operation, joining a branch ‘Topic’ into another branch ‘Main’.
A---B---C Topic
/ \
D---E---F---G---H Main
Topic and Main, above, have a common point in history - they originally diverged at commit E. Commit E can also be described as a “common ancestor” of the two branches.
What I’m trying to accomplish is a different picture.
E---F---G---H---I Project-1
\
A---B Junk-Drawer
/
V---W---X---Y---Z Project-2
Project-1 and Project-2 don’t have any common ancestor commit. In git parlance, this means they are “unrelated histories”. In particular, they are also unrelated histories with respect to Junk-Drawer, which I’m trying to merge them into.
Let’s see what happens if I try to do this merge:
$ pwd
/home/logan/Code/JunkDrawer
$ git remote add github/mexprander https://github.com/hoodlm/mexprander.git
$ git fetch github/mexprander
remote: Enumerating objects: 17, done.
remote: Counting objects: 100% (17/17), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 17 (delta 2), reused 17 (delta 2), pack-reused 0
Unpacking objects: 100% (17/17), 3.13 KiB | 458.00 KiB/s, done.
From https://github.com/hoodlm/mexprander
* [new branch] main -> github/mexprander/main
$ git merge github/mexprander/main
**fatal: refusing to merge unrelated histories**
There’s a flag for that.
--allow-unrelated-histories
By default, git merge command refuses to merge histories that do not share a common ancestor. This option can be used to override this safety when merging histories of two projects that started their lives independently. As that is a very rare occasion, no configuration variable to enable this by default exists and will not be added.
I have, in fact, stumbled on the very rare occasion alluded to by the documentation! So, let’s try it out:
$ git merge --allow-unrelated-histories -- github/mexprander/main
$ ls
Cargo.toml README.md src
So far so good! Let’s try a second repo, another Rust project:
$ git remote add github/logan-rosalind-rust https://github.com/hoodlm/logan-rosalind-rust.git
$ git fetch
...
$ git merge --allow-unrelated-histories -- github/logan-rosalind-rust
Auto-merging .gitignore
CONFLICT (add/add): Merge conflict in .gitignore
Auto-merging Cargo.toml
CONFLICT (add/add): Merge conflict in Cargo.toml
Auto-merging README.md
CONFLICT (add/add): Merge conflict in README.md
**Automatic merge failed; fix conflicts and then commit the result.**
Well, that didn’t go so well. Right off the bat,
I have identical filepaths for README, .gitignore, and Cargo.toml. This blind merge also crams
all the files in the src
directory together, which is not what I want either.
The core problem is that git-merge preserves the file hierarchy of both commits. In a perfect world there would be a single-shot project-merge command that knows to merge files into per-project subdirectories: so that they land in ‘RustProject1/README’ and ‘RustProject2/README’. But, it’s easy enough for me to set up these subdirectories myself before merging.
Typically I would run this in the git repo root:
$ PROJECT="rust/mexprander"
$ mkdir -p $PROJECT
$ mv * $PROJECT/
I also want to pull the .gitignore file along:
$ mv .gitignore $PROJECT/
Finally, I would do a quick test of the project at this point (for example, cargo clean && cargo test
) to make sure that nothing
got broken or lost in the move.
Then I made a git-commit (locally, not pushed yet) with the move. This is what this commit looks like:
$ git diff --compact-summary 1e2f91f7^ 1e2f91f7
.gitignore => rust/mexprander/.gitignore | 0
Cargo.toml => rust/mexprander/Cargo.toml | 0
README.md => rust/mexprander/README.md | 0
{src => rust/mexprander/src}/lib.rs | 0
4 files changed, 0 insertions(+), 0 deletions(-)
I also wanted to do a bit of deconfliction in the git commit log. For example, two different project histories might have a commit with the exact message “Adding README.md”. To disambiguate, I want a brief project annotation at the front of each commit message, like “[mexprander] Adding README.md” and “[rosalind] Adding README.md”.
None of my projects had a very long history (no more than 20 commits), so I annotated semi-manually with an interactive git-rebase from the --root commit. (The “committer-date-is-author-date” flag prevents commit timestamps from getting bumped to the present.)
$ git rebase --committer-date-is-author-date -i --root
Specify that every commit needs to be reworded:
%s/pick/reword
Then, I modified each message to have a little tag prepended to it, for example:
- Add multiply token
+ [mexprander] Add multiply token
Again, I am keeping this rebased history local on my machine for now.
I’m ready to retry my merge. This time, I want to merge from the local repositories, which have non-conflicting directory structures. It’s actually possible to use a local directory as a “remote” repository in git, like so:
$ git remote add someproject $HOME/Code/SomeProject
The sequence that I followed for each project was like this:
$ git remote add someproject ~/your/local/path/to/someproject
$ git fetch someproject --tags
$ git merge --allow-unrelated-histories -- someproject/main
$ git remote remove someproject
All of my code is now in a coherent directory structure in a single git repository. Slightly abridged tree output:
$ tree .
.
├── hn_offline <-- originally its own repo
│ ├── hn_offline.sh
│ ├── LICENSE.txt
│ ├── out
│ └── README.md
├── html
│ └── tango.html
├── lographiq <-- originally its own repo
│ ├── bin
│ ├── example
│ ├── lib
│ ├── LICENSE
│ └── README
├── README.md
└── rust
├── mexprander <-- originally its own repo
│ ├── Cargo.toml
│ ├── README.md
│ └── src
│ └── lib.rs
└── rosalind <-- originally its own repo
├── Cargo.toml
├── LICENSE.md
├── README.md
└── src
├── dna.rs
└── subs.rs
And, after all this effort, the commit log is historically complete and shows the origin of each project, with the right author dates on every commit.
$ git log --all --pretty=format:"%h %as %s" --graph
* a25d438 2023-12-17 html - tango color palette
* 6b9d9b5 2023-12-17 Repo-wide README and consolidated .gitignore
* 74f85bf 2023-12-17 Merge remote-tracking branch 'lographiq/master'
|\
| * 8f9c651 2023-12-17 [lographiq] Move to lographiq directory for merge into JunkDrawer monorepo
| * bd947b2 2015-07-05 [lographiq] more playing around with polygon primitives
| * 619cc87 2015-07-05 [lographiq] simple invert effect
| ...
| * 67b8e73 2015-06-12 [lographiq] adding some additional logging to examples
| * b35738a 2015-06-12 [lographiq] Initial commit - basic 'draw' engine, canvas library that can render to ppm, and two simple examples
* dd6856b 2023-12-17 Merge remote-tracking branch 'rosalind/main'
|\
| * 4729e71 2023-12-17 [rust-rosalind] Move to rust/rosalind directory for merge into JunkDrawer monorepo
| * 43f94be 2023-11-10 [rust-rosalind] iprb - mendel's first law probability calculator
| ...
| * 5b0bf20 2023-10-22 [rust-rosalind] RNA
| * 14bb9c3 2023-10-22 [rust-rosalind] Initial project setup, solution to first DNA problem
* 83a14f7 2023-12-17 Merge remote-tracking branch 'mexpr/main'
|\
| * 1e2f91f 2023-12-17 [mexprander] Move to rust/mexprander/ directory for merge into JunkDrawer monorepo
| * cf03643 2023-11-30 [mexprander] README
| * 15f5b25 2023-11-30 [mexprander] Add multiply token
| * a094f3c 2023-11-26 [mexprander] Basic expression expander for flat additive lists
* 8fe2182 2023-12-17 [hn_offline] Moving everything to hn_offline directory for merge into JunkDrawer monorepo
* 7d3223d 2023-01-29 [hn_offline] Rewrite to use wget recursive mode
* da4e444 2023-01-26 [hn_offline] Download hn.js
* ...
* fd19658 2023-01-21 [hn_offline] Clean shellcheck
* 779db2b 2023-01-21 [hn_offline] Initial implementation
If you look at my JunkDrawer on Github, though, the dates will look wrong. When I did my migration, I didn’t know about the --committer-date-is-author-date flag on git-rebase (although I do include it in the instructions above). So, when I annotated all the commit messages, Git also bumped the commit date to the present time. GitHub displays Commit Time, not Author Time - so, about 8 years worth of sporadic commits appear to be compressed into about an hour.
But, all in all, still a successful migration. Back to creating Junk…