Home / Blog / How to purge sensitive data from Gitlab

How to purge sensitive data from Gitlab

By Harry Robbins, 17 May 2021

So a friend of mine recently accidentally committed an unencrypted and supposedly-secret API key to Git, got their remotes in a twist, and pushed the secret information to a repo on our self-hosted Gitlab instance. This approach requires that you have root access to a self hosted gitlab instance. For another workaround skip to the end.

There are a few steps to fixing this problem because Gitlab does some magic in the background.

First, remove the offending commits from git (this approach assumes you made the mistake in a recent commit and are ok with deleting last commit completely):

git reset --hard HEAD~1 && git push --force

This resets the head of your current branch to the previous commit and completely removes your most recent commit (and associated objects) from the git history. When you push this to the server the git log will no longer contain a reference to your most recent commit.

However, gitlab will still keep the offending changes and files on the server and you’ll still be able to view the file through the Gitlab UI e.g. at https://gitlab.mydomain.com/outlandish/gitcrimes/-/commit/052983ee7dd36eb268b63f9e49b99f1772e839ba and https://gitlab.mydomain.com/outlandish/gitcrimes/-/blob/052983ee7dd36eb268b63f9e49b99f1772e839ba/.env.example

These files would normally be deleted eventually by something called ‘garbage collection’, which periodically deletes git objects which are not referenced by any commit. You can manually trigger this to run on the gitlab server using the “housekeeping” function.

However, in our case housekeeping did not remove the offending content. This is because gitlab automatically generates “keep-around” references – kind of hidden branches – which references the commits. This prevents 404 errors being generated in the Gitlab UI.

To remove these references you need to ssh onto the gitlab server, find the relevant git repo and remove the references manually.

By default the repositories are installed in /var/opt/gitlab/git-data/repositories/ so the path for our “gitcrimes” project mentioned above is /var/opt/gitlab/git-data/repositories/outlandish/gitcrimes.git

The files we need to edit in that directory are refs  and packed-refs.

Back up the affected files, then edit them (e.g. with sudo nano <path to repo>/packed-refs) and search for the offending commit hashes (052983ee7dd36eb268b63f9e49b99f1772e839ba in the example above).

Remove the lines that mention the offending commit hash and save the file.

Then run git --git-dir <path to repo> gc --aggressive --prune=now on the server, which is similar to running housekeeping via the Gitlab UI, but which will also remove more recent files which are not referenced by any commit (the default is to only remove files that are at least two weeks old).

Once you’ve done the above you should find that you get 404 Errors where you previously could view the sensitive data via the UI (e.g. https://gitlab.mydomain.com/outlandish/gitcrimes/-/blob/052983ee7dd36eb268b63f9e49b99f1772e839ba/.env.example should give you a 404).

Another way to achieve the same goal without directly accessing the server is to:

  1. Reset the head to the commit before the offending one
  2. Run git --git-dir . gc --aggressive --prune=now locally
  3. Completely delete the git repo from your Gitlab server
  4. Create a new gitlab project, optionally with the same name/path
  5. Push your local version to the new gitlab repo which will import all the history but without the unreferenced files

Even better: do not commit git crimes! They are hard to fix. It’s very much a measure-twice-and-cut-once kind of scenario.