As a full-stack developer, I rely on Git daily as my source control and versioning tool of choice. But there‘s always an anxiety that I may inadvertently check in sensitive files and push them publicly for the world to see. API keys, SSH credentials, .env configurations – exposing any of those could lead to security disasters.

While we all try our best to avoid these situations, mistakes do happen. In this comprehensive guide, I‘ll explain how Git works under the hood to manage file versions, how to undo commits containing sensitive data, and some best practices to avoid leaks in the first place.

Understanding Git‘s Architecture

To grasp how to rewrite commit history, you need to understand how Git structures repositories and saves point-in-time snapshots.

At the core, Git thinks of data as streams of blobs, trees, and commits:

  • Blobs store file data
  • Trees map filenames to blobs
  • Commits contain trees, commit messages, authors, and other metadata. Each commit also points to a parent commit, which forms the linked list making up a branch.

This object model is how Git manages versions and histories in a lightweight yet robust manner.

When you git add and git commit, Git writes new blobs and trees to represent your files, then wraps them into a commit:

Git Object Model Diagram

(Source: Atlassian)

Commits point backwards to older commits, forming a timeline of all changes on a branch:

Git Commit History

(Source: Atlassian)

This commits history is stored locally on your dev machine. When you git push though, it gets replicated on the remote as well. Usually GitHub or BitBucket these days.

The Peril of Pushing Secrets

With this model in mind, we can understand the root cause behind exposing confidential data in Git commits.

As developers, we often need to work with credentials like database passwords, API tokens, SSH keys, AWS access credentials, and more. These usually get added to config files that also contain application code. For example:

config.js

const dbPassword = ‘s3curePa$$w0rd!; 

.env

DB_PASSWORD=s3curePa$$w0rd!;
SECRET_KEY=1234567890

docker_env.sh

#!/bin/sh
SSH_PRIVATE_KEY="some-private-key"

When we git add . these files and commit them, the confidential data gets baked into blobs and trees. Then after pushing upstream, they become publicly accessible across commits and branches!

According to recent surveys, over 34% of organizations have had sensitive source code leak through public repositories. And these types of incidents can have huge consequences:

Data Exposure Impact
Credentials Attackers can directly access and abuse exposed credentials to breach databases, APIs and cloud services. The Bazaarvoice breach of 2012 began by an engineer accidentally posting AWS keys on GitHub.
API Keys API keys directly grant access to applications and information. The Exposure Notification system got abused when a GitHub repo leaked API keys in 2020.
Private Keys SSH keys give access to backend servers, so leaking them hands over the keys to infrastructure itself! Multiple NSA projects have been hacked starting from exposed SSH private keys.
System Configs Leaked config files, docker envscripts and more reveal IP addresses, access rules and other sensitive metadata attackers use to discover weak points and design targeted attacks. The 2018 Parliament Street breach used leaked system configs to fingerprint infrastructure then execute access exploits.
PII and Passwords Personally identifiable information like SSNs, names, birth dates as well websites, credentials and financial information can lead toidentity theft and serious harm if made public through version histories. Password leaks also aid attackers accessing other systems.

(Sources: Duo Labs, Intezer, Security Affairs)

Clearly, allowing secrets into repositories is playing with fire. So what should we do if such mistakes make their way into commits on a shared remote?

Removing Committed Secrets After Pushing

If credentials or other sensitive information has been pushed up publicly, swift action is required for damage control:

1. Revoke Compromised Secrets

If any API keys, passwords or SSH keys leaked, rotation or revocation should be the first response. Generate and update new credentials to replace the revealed ones wherever configured. This mitigates the harm that exposure can cause by preventing actual abuse or account takeovers.

For example, if a GitHub token leaked, create a new one then remove (or avoid adding back) the leaked token. The same applies for AWS keys, social media app secrets, etc.

2. Delete Sensitive Files From Local Repository

Next, remove the problematic files completely from the local clone:

git rm --cached sensitive_file.txt

The --cached options stage deletion of files without actually removing local copies.

You‘ll get a message like:

rm ‘sensitive_file.txt‘

This untracks the file from Git but leaves your working tree untouched just in case.

3. Rewrite Commit History

With the files gone from versioning, it‘s now time to rewrite commits to erase historical evidence as well.

Amend The Most Recent Commit

You can amend changes into the previous commit using --amend:

git commit --amend -m "Removed exposed API key"

This updates the last commit‘s contents to reflect the currently staged changes while retaining the old commit date. Perfect for modifying the last commit touched.

For commits further back in history, a more heavy handed approach is needed.

Force Push Overwrite

We can force push a branch to overwrite the remote:

git push origin main --force

This will make Git take your local state (with files removed) and directly override the remote with it. The remote commits will be erased and rewritten, effectively deleting historical evidence.

But this is very dangerous and destructive, especially for shared repositories. As you can imagine, force pushing breaks other developer‘s commit histories and can lead to massive syncing issues:

Force Push Fallout

(Via Reddit u/OnlyTheRealAdvice)

So only attempt force pushes on private personal repos on branches not being collaborated on.

4. Clear File Caches

Some Git servers generate caches or archives of file contents that may still retain copies of secrets even after history rewrites.

For GitHub specifically, browse to the leaked file online then select Delete file to clear server caches as well. This purges cached blobs and removes files from the web interface.

BitBucket also allows deleting server-side file copies from their web UI in the same manner.

5. Inform Collaborators

If you shared sensitive information publicly even for a short period, consider letting collaborators know a potential compromise occurred.

At work this may mean disclosing the leak to your company‘s security officer or administrators.

For open source projects, post an issue informing fellow maintainers of what transpired as well as steps contributors should take, like resetting local branches or updating credentials.

Being upfront about accidental exposures shows accountability and allows teams to address repercussions together.

Alternatives To History Rewrites

Force pushing and rewriting shared commit history should be an absolute last resort since it can seriously disrupt workflows. Some safer alternatives to consider first:

Revert Changes

Creating a new commit that reverts previous changes avoids mutating shared history:

git revert 0766c053..HEAD

But this may still leave sensitive file names and contents in old commits accessible through reflog and log history.

Reverse Applying Patches

You can extract patches from commits with git format-patch then apply them in reverse with git am to effectively undo changes:

git format-patch faulty-commit-hash -1 --stdout > undo.patch
git am -R undo.patch

Similar to revert, old commits remain intact though.

git-scrub

For highly sensitive content like passwords and keys, git-scrub aims to purge file contents from current and historical commits by overwriting them with garbage content.

This keeps commit history intact while erasing sensitive information. But blobs and trees may still contain plaintext filenames allowing inference of obscured secrets.

Delete and Recreate Repositories

As a last option, deleting the repository and pushing a fresh copy from local strips all commit history. This may be necessary for highly confidential leaks published more widely.

But obviously rebuilding a repo has major drawbacks like losing stars, watchers, issues, wiki pages and other metadata.

Preventing Future Leaks

While fixing exposed files occupies our immediate attention, longer term prevention is key to avoid keep secrets out of version control altogether. Here are some best practices to incorporate:

Segregate Configuration

Store configuration like credentials and keys in dedicated files excluded from source control instead of scattered through app code.

.env Files

.env files gathered environment variables in a single location for apps to load configs from. Keep them out of versioning:

.gitignore

# .env configurations
.env

docker-compose Env Files

Docker env files pass environment variables into containers. Add env files to .gitignore:

.gitignore

# Docker env files
docker-compose.yml
.env

This holds confidential data in ignored files not copied upstream instead of app code itself.

Secret Management Services

Vaults and secret managers like HashiCorp Vault, AWS Secrets Manager and GCP Secret Manager provide dedicated encryption, access control and auditing around credentials.

Apps access them through runtime calls instead of checking secrets into source control.

Environment Variables

Storing confidential values in environment variables ensures they stay off disk and out of versioned files completely.

Code only refers to process environment making secrets fully ephemeral. CI/CD pipelines inject production credentials this way.

Security Linter Integration

Catch accidental uploads early by integrating security focused linters into pre-commit hooks. GitGuardian, Snyk and TruffleHog scan changes for secrets and files meeting configured rules around sensitive patterns.

Adding them as pre-push hooks prevents leaks reaching remotes rather than dealing downstream. Custom policies tailored to organizational concerns help as well.

Restrict Remote Access

Limiting what repositories non-essential personnel have access to reduces attack surfaces from exposed credentials. Audit permissions and restrict rights to the minimum viable set.

For private keys I exclusively rely on SSH certificate authorization over password auth. This allows locking down access by source IP, user and other policies without ever putting secrets into code.

Conclusion

Exposing API tokens or other confidential data publicly poses tremendous risks and anxiety. Though accidents happen, sensitive information should never reach remote repositories.

Recovering from such incidents requires swiftly revoking compromised credentials, surgically rewriting history, and clearing server-side caches. Just beware that history rewrites on shared repositories have extensive destructive side effects.

Ultimately practicing careful versioning hygiene is the best remedy. Reserve secrets for ignored config files, leverage secret management systems, use environment variables, install guardrail linters, and restrict repository access.

Embracing these precautions helps developers code at ease again knowing one clumsy git push won‘t lead to catastrophe.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *