Threat Modeling GitHub - How vulnerable-by-design Github is?

Understanding the Security Debt Baked Into GitHub’s Design

Apr 09, 2025

Introduction

In the wake of the recent tj-actions/changed-files compromise on March 14, 2025, I decided to take a hard look at GitHub’s security posture. This incident, where an attacker gained write access to a popular GitHub Action used by thousands of workflows, inserted malicious code, and retagged releases so even pinned versions pointed to the tainted commit, serves as a stark reminder of the security implications inherent in GitHub's design.

While Step Security deserves recognition for identifying this attack, the incident highlights a broader concern: GitHub’s architecture presents unique security considerations that go beyond simple bugs or misconfigurations. As the central hub for both open-source and proprietary code across the industry, GitHub has become a high-value target for sophisticated threat actors. When a single compromise can potentially impact thousands of downstream organizations simultaneously, understanding the platform's inherent design vulnerabilities becomes crucial.

This in-depth threat model analyzes few categories of design-level vulnerabilities in GitHub, highlighting how attackers could exploit inherent design choices rather than just temporary bugs. Each flaw stems from fundamental trade-offs between collaboration, usability, and security – trade-offs that affect every organization using GitHub.

Approach

My approach to this threat model differs from conventional vulnerability analysis. Rather than focusing on implementation bugs or misconfigurations that GitHub might quickly patch, I've examined the platform's fundamental architectural decisions and their security implications. This analysis goes beyond identifying what could go wrong to understand why certain vulnerabilities exist by design. For each category, I've considered the tension between GitHub's collaboration-first ethos and security best practices, identifying where these tensions create exploitable gaps. By taking a systems-level view of GitHub's architecture and examining trust boundaries, privilege models, and data flows, this threat model reveals security considerations that persist regardless of patch cycles or configuration changes.

It's important to acknowledge that many of the design choices highlighted in this threat model aren't simply "flaws" to be fixed, but rather intentional trade-offs GitHub has made to facilitate collaboration and developer productivity. GitHub has built its platform primarily to enable seamless code sharing and integration - features that have revolutionized software development. Many of the security considerations identified here exist precisely because GitHub prioritizes openness and ease of use, which are core to its value proposition. Perfect security would likely come at the cost of the very features that make GitHub valuable to millions of developers worldwide.

1. Supply Chain Attacks on GitHub

GitHub is a central hub for open-source and internal code, making it a ripe target for software supply chain attacks. Dependency management and trust are core concerns: Projects often pull in hundreds of external packages or actions, and a single compromised dependency can cascade into a breach docs.github.com GitHub’s dependency graph and Dependabot alerts help identify known vulnerable packagesdocs.github.com docs.github.com but novel attacks can slip past these defenses.

Dependency Confusion & Namespace Attacks: GitHub users frequently rely on private packages alongside public ones. An attacker can exploit this by publishing a malicious package with the same name as an internal dependency to a public registry github.com. Build systems that don’t strictly prefer the private source may fetch the attacker’s package, giving the attacker code execution within the victim’s environment. This dependency confusion tactic successfully infiltrated major companies’ networks. GitHub’s design does not inherently prevent such mix-ups; it relies on developers to scope package names or use private registries correctly.
Package Hijacking, Repo Jacking & Namespace Reuse Attacks: Many package managers (npm, PyPI, etc.) and GitHub’s own Package Registry trust the maintainer of a given name. If a maintainer’s GitHub account is compromised or if they relinquish a namespace, an attacker can hijack it. For example, if a GitHub user or org renames their account, GitHub will redirect dependency links to the new name – until someone else claims the old name aquasec.com aquasec.com. Attackers exploit this by taking over abandoned usernames (“RepoJacking”), then creating repositories or packages with the same name. Downstream projects pulling these as dependencies will unwittingly run the attacker’s code aquasec.com aquasec.com. GitHub has attempted to mitigate this (e.g. reserving names with high clone counts) but the protections are incomplete and can be bypassed, leaving millions of repos potentially vulnerable.
Additionally, when a repository (especially under a user account) is renamed or transferred, the old name can potentially be claimed by someone else. GitHub’s redirection of traffic from old name to new is helpful, but it creates a window where a malicious actor can hijack traffic or dependency links. Projects depending on code at olduser/oldrepo might unknowingly pull from the attacker’s repo after the hijack.
Malicious Repository Clones (Repo “Confusion”): Beyond package managers, attackers may target developers’ trust in GitHub search results. Adversaries can clone popular repositories, infuse malicious code, and re-upload them under identical names apiiro.com. They can then fork these en masse to boost visibility. Unsuspecting developers searching GitHub might mistake these malicious copies for the legitimate project and incorporate them. Unlike dependency confusion which exploits automated systems, this “repo confusion” banks on human error apiiro.com. GitHub’s design (easy forking, naming overlap) enables this—there’s no built-in verification of repository authenticity unless the project is a verified GitHub organization or uses signed releases.
Third-Party GitHub Actions as Dependencies: The supply chain attack which led to this exercise, GitHub Actions workflows often reuse community actions (essentially code packages). A compromised action is a supply chain backdoor. The incident on March 14th 2025 involved an attacker gaining write access to a popular tj-actions/changed-files action, inserting malicious code, and retagging releases so even pinned versions pointed to the tainted commit. Many workflows trust version tags, assuming they’re immutable, but Git tags can be moved by someone with repo access. This design gap meant that thousands of pipelines pulled the updated action code containing an exploit. The payload in that case scanned the CI runner’s memory for secrets and printed them to build logs, although a much clever exploit would exfiltrate and send it to another location. GitHub does not yet enforce signed action releases or integrity verification by default, so the trust model for actions is largely based on maintainer reputation.

2. Access Control Design Issues

GitHub’s role-based access control (RBAC) model is relatively simple – perhaps too simple for complex enterprises. It was originally built for open-source collaboration, with security added later. At the organization level, roles like “Owner” and “Member” exist, and at the repository level, roles include “Admin”, “Maintain”, “Write”, “Read” (and “None”). While simplicity aids usability, it can introduce security gaps:

Overly Broad Default Access: As noted, an org’s “base permission” for members can unintentionally overexpose data. If set to “Read”, every member who joins the org instantly gains read access to all internal repositories. This is a design choice to favor open collaboration inside organizations, but in a security context it’s risky. A single compromised member account (even with no explicit team or repo assignments) could browse and clone everything. Many enterprises now set this to “None” to enforce least privilege, but GitHub does not default to “none” – leaving room for misconfiguration, which happen in a lot of early stage startups where dedicated security teams doesn’t exist.
Lack of Granular Privilege Separation: Repository admins inherently can do almost anything in that repo, including changing settings, secrets, or deleting it. There’s no built-in way to require dual approval for destructive actions like deleting a repo or transferring ownership – one admin can execute those. In an organization, Owners have administrative rights over all repos by design (they can add themselves to any repo). This means a phished or malicious owner account is essentially a total compromise of that organization’s GitHub presence. GitHub’s design does not allow scoping an owner’s power to subsets of repositories – owners are all-powerful. The only mitigation is to have as few owners as possible and protect those accounts (e.g. with 2FA and hardware keys).
Team and Repository Permissions Quirks: GitHub’s permission inheritance can be a double-edged sword. Teams simplify managing many users’ access, but a team maintainer (a role that can manage team membership without full org ownership) could add an unintended user to a team that grants repo access, if not carefully monitored. Also, outside collaborators (users given direct access to a single private repo without being org members) have no visibility into org context – which is good for isolation, but it means org policies (like SAML enforcement, IP restrictions) might not apply to them. This could be a loophole: for example, a contractor (with BYOD policy) is added as an outside collaborator to a repo, bypassing SSO requirements that regular org members must follow, thereby becoming a weak link for attackers.
Branch Protection and Bypass: Branch protection rules (require PR reviews, prevent force pushes, etc.) are a key security feature, but certain roles can bypass them. By default, Admins can push notwithstanding protection (unless an explicit setting prohibits even admins). Organization owners can also disable or edit these rules. The model assumes these high-privilege users are trustworthy, but if they are compromised, the protections can evaporate. In essence, GitHub’s access control is flat at the top – it doesn’t inherently enforce checks on top-tier roles. Admins are risky because, for many companies, GitHub is the company. Their IP lives there. Compromise an admin, and you compromise everything.
Privileged GitHub Apps & Tokens: GitHub encourages integration via OAuth apps, GitHub Apps, and personal access tokens (PATs). A subtle design issue is that PATs, until recently, were all-or-nothing per scope for an account – if a user’s PAT had the “repo” scope, it could act on all the user’s repos. This means if an attacker phishes a developer’s PAT, they might gain broader access than intended. GitHub’s newer fine-grained PATs address this by allowing repo-specific tokens, but legacy tokens remain a risk and widely used. Similarly, a GitHub App installed on an org can be granted wide permissions; if the app’s key is stolen, an attacker inherits those privileges. The design challenge is balancing integration flexibility with principle of least privilege – historically, GitHub erred on the side of flexibility.

3. GitHub Actions & CI/CD Security Issues

GitHub Actions brings CI/CD directly into the platform, but its design can introduce novel security pitfalls. Workflows run in response to repository events and can perform virtually any action (compiling code, deploying, etc.), often with access to secrets and the repository content. The threat model for Actions must consider untrusted input (like pull request code) and the need to isolate privileges. Key design-level vulnerabilities include:

Unauthorized Code Execution via Workflow Triggers: GitHub Actions has special trigger events like pull_request_target and workflow_run that intentionally run with higher privileges (access to secrets, write permissions) to enable certain use cases. However, these can be abused. For example, a malicious forked repository can open a PR to the base repo and include a workflow file (GitHub allows workflow files in PRs). If the base repository uses pull_request_target, that workflow will execute in the context of the base repo (with its secrets) before the code is merged. GitHub added safeguards (by default, first-time contributors’ workflows require manual approval, and the default GITHUB_TOKEN for forked PRs is read-only). But maintainers may override these or inadvertently run a malicious PR’s code. The workflow_run event is even less known: it triggers a second workflow after one completes, often used to perform privileged actions post testing (like posting statuses or deploying). If not carefully handled, an attacker’s PR (even with limited initial rights) could manipulate artifacts or outputs such that when the workflow_run workflow triggers, it executes attacker-controlled code with full privileges. Many popular repositories were found vulnerable to such patterns because the design trusts the link between unprivileged and privileged workflows without strong validation. Read more at legitsecurity.com
Pull Request Hijacking: In this GitHub workflow experiment, several important behaviors and risks were observed. Pull requests (PRs) can trigger workflows on pull_request:synchronize even when the code is on an unmerged feature branch. These workflows may still access repository and organization-level secrets unless restricted using scoped environments. PR authors cannot approve their own pull requests, and if they are the last person to push to the PR, their approval does not make the PR mergeable—though it still counts toward code owner approval. Additionally, contributors can hijack PRs by pushing new commits, even cherry-picking commits with spoofed or unknown authors. GitHub workflows may also be used to commit changes using the github-actions[bot], making attribution and review tracing harder. These behaviors reveal several ways in which the trust boundaries in GitHub Actions can be bypassed or misused when not carefully restricted. There are multiple scenarios where Alice is the author, Bob is the malicious committer or uses github-action[bot] or another app (or service account with write access) to push, approve and merge malicious code. Most of them can be prevented by putting branch protection rules and protected branches. The scenario that is unpreventable by current github model is -
- Alice opens a pull request.
- A separate GitHub Action (e.g., triggered by another repo, or branch) commits vulnerable or malicious code to Alice’s PR using the github-actions[bot] identity.
- Bob, a code owner, reviews and approves the PR.
- The PR is now mergeable because:
  - Bob did not push last, so his approval counts.
  - Code owner approval is satisfied.
- Bob merges the PR, intentionally allowing the vulnerable code.
- Why it’s unpreventable:
  - There is no way to cryptographically prove or prevent that the code merged wasn’t crafted or injected by the approving code owner via indirect means (e.g., another workflow, API call, bot commit).
  - GitHub does not verify authorship linkage between the approver and the last pusher when the pusher is a bot (e.g., github-actions[bot]).
  - The system assumes code owner trust is sufficient, even when commits are made by a service account.
Token Leaks and Scope Creep: Every Actions run gets a GITHUB_TOKEN (and potentially other secrets). By design, this token’s permissions depend on the event. However, prior to recent changes, the default token had quite broad write access to the repo. Even now, if a workflow or an action within it needs broader permissions, maintainers might set permissions- contents: write (for example) or provide PATs as secrets, which then become available to the runner. Attackers exploit this by tricking workflows into revealing these tokens. One common vector is exfiltrating secrets via logs or artifacts. Although GitHub masks known secret values in logs, sophisticated attacks can bypass this by splitting, concatenating or encoding secrets. The compromised tj-actions/changed-files incident is a prime example: the malicious code dumped secrets from memory and printed them in parts to the log, avoiding straightforward pattern matching. If repository logs are public (as in public repos or public Actions artifacts), the secrets become exposed to anyone. This is a design tension: CI logs need to be accessible for debugging, but can become a channel for secret leakage if an attacker gains the ability to print sensitive data. Moreover, anyone who has write access could delete workflow run logs once the purpose of secret exfiltration is completed.
Shared Infrastructure & Multi-Workflow Contamination: GitHub Actions enables caching and artifact sharing to speed up workflows. However, these introduce trust issues across workflow runs. GitHub’s cache mechanism allows one workflow to restore cache entries saved by a previous workflow run (keyed by arbitrary strings). There’s no built-in segregation of cache by trust level. This means if an attacker can influence a low-privileged workflow (say, on a pull request), they could store a malicious payload in the cache. Later, a privileged workflow (e.g., on the main branch) might restore that cache and inadvertently execute the malicious code. A single compromised run can taint the cache for all future runs, persisting until the cache is explicitly updated. GitHub acknowledged the issue but currently considers it a trade-off rather than a vulnerability – the cache’s cross-run nature is “not considered a bug”. Similarly, artifact reuse between workflows had a flaw: the API to download artifacts didn’t distinguish if the artifact came from a forked repository context. This means an attacker could craft an artifact in a fork (perhaps a compiled binary) and if the base repo’s workflow naively pulled the “latest artifact”, it might grab the fork’s artifact. GitHub partially mitigated this by adding metadata so workflows can check the source, but the responsibility is on maintainers to implement those checks. These are design issues where GitHub provides the feature (sharing data between runs) without enforcing security boundaries, leaving room for abuse.
Self-Hosted Runners & Infrastructure Lateral Movement: Organizations can add self-hosted runners for Actions (to run jobs on their own VMs or hardware). This introduces a critical design risk: code from GitHub (possibly from a fork PR) can execute on machines inside the company network. An attacker could use a seemingly benign PR to execute malicious code on a self-hosted runner, creating a foothold for lateral movement across internal networks. While GitHub provides some controls (like runner groups and labels to restrict which workflows can use certain runners), these mechanisms rely primarily on proper configuration rather than enforced isolation. This risk is significantly amplified by GitHub's secret handling design: GitHub's secret masking mechanism only prevents secrets from being displayed in workflow logs—it does not restrict code from accessing or using those secrets during execution. While secrets appear as *** in logs, they remain fully accessible to any code running in the workflow as environment variables. When malicious code executes on self-hosted runners, it can freely access these environment variables containing sensitive secrets, regardless of log masking. Self-hosted runners typically operate with elevated privileges (for container builds, deployment tasks, etc.) and have network connectivity to both GitHub and internal resources. This creates a perfect attack pathway:
- compromise a workflow
- access (masked) secrets during execution
- use the runner's privileged position to pivot deeper into the organization's infrastructure.
Documented incidents of cryptomining attacks on GitHub's hosted runners demonstrate the feasibility of workflow exploitation; when extended to self-hosted environments with access to production secrets and internal networks, the potential impact becomes significantly more severe.

In summary, the design of GitHub Actions blends trust between code from potentially untrusted sources and high-privilege automation.

4. Secrets Management Issues

GitHub repositories often need to store API keys, credentials, and other secrets for use in CI workflows or applications. GitHub has introduced several features for secret management and scanning, but there are inherent limitations and design trade-offs that attackers can exploit.

Secrets in Repositories and Git History: By design, Git will retain anything committed unless it’s purged from history. This means if a developer accidentally commits a password or key, removing it in a later commit doesn’t erase the secret from the Git history. Attackers frequently scan GitHub for such exposed secrets. GitHub’s secret scanning service can automatically detect known secret patterns in public repos (and private repos if enabled). However, it’s pattern-based and might not catch custom or obscure credentials. An attacker can split a secret across multiple lines or encode it to evade detection. Moreover, secret scanning alerts typically fire after the commit is already pushed (except for push protection which blocks some well-known secrets in real-time). Thus a window of exposure exists where a secret may leak. If a project isn’t enrolled in Advanced Security or any other secret scanning service, or if the secret format isn’t one of the 200+ patterns GitHub recognizes, the leak can go unnoticed by GitHub. False negatives in secret scanning (i.e., missed detections) are an ongoing challenge, inherent to the pattern-matching design – attackers and even well-meaning developers can unintentionally circumvent the patterns.
Effectiveness of Secret Scanning and Push Protection: GitHub’s push protection is a proactive measure that rejects commits containing certain types of secrets (like AWS keys, Azure keys, etc.) by checking against known regex patterns
This is a great feature, but it only covers tokens from providers who have partnered with GitHub or those that match specific formats. If a secret doesn’t meet these criteria (for example, a database password like SuperSecret!123 or an internal API key), GitHub won’t block it. Even for detectable secrets, push protection can be bypassed if the secret is slightly modified (e.g., concatenating strings). Attackers are well aware of how secret scanning works and can craft something that searches public GitHub code for leaked credentials immediately, taking advantage of any gap between commit and detection. Another aspect is who gets notified: for public repos, GitHub might alert the cloud provider or the partner, rather than notifying you e.g. GitHub tells AWS if an AWS key is leaked so they can revoke it, which is good but doesn't guarantee immediate action on your account. This notification disconnect creates a critical time window where your credentials remain valid but outside your control. By the time you're alerted through official channels, attackers may have already discovered and exploited the leaked credentials. For custom tokens or non-partner credentials, there may be no notification at all unless you've configured webhook alerts or actively monitor GitHub's security tab. It also does not take into account service disruption that might be caused because of this. When a cloud provider automatically revokes leaked credentials without your immediate knowledge, running services dependent on those credentials can suddenly fail. This creates an operational blindside where your team may be troubleshooting mysterious service outages without realizing they stem from a secret leak event. This "protection" mechanism can inadvertently transform a security incident into a reliability incident, potentially affecting production systems before your team even becomes aware of the underlying cause.
Secrets in Actions and Storage Mechanisms: When using GitHub Actions, secrets are stored in encrypted form and provided as environment variables to workflows. Only users with admin access to a repo can add or update secrets, and once stored, the secret value isn’t viewable via the web UI – it’s write-only (which is good). However, any user who can commit code to run in Actions (i.e., with write access) can potentially cause that secret to be revealed (for example, by echoing it in the workflow), because the runner will have the secret. This means the real access control on repository secrets is actually at the level of who can push code to the repository. GitHub’s design choice is that secrets are accessible to all code in the repo by default. To mitigate exposure to forks, GitHub does not expose repository secrets to workflows triggered from forked repos (hence maintainers must be careful with pull_request_target etc. as discussed). For additional safety, GitHub introduced environment secrets and protection rules – e.g., you can mark an environment like “Production” and require manual approval before a deployment job (with production secrets) runs. This adds a checkpoint, but not many projects use environment rules thoroughly. Also, if its auto-merge (like in the case of tj-actions/changed-files compromise) or merged with an oversight on malicious code, workflows that are running in Production environment, would still be able to exfiltrate environment secrets.

In essence, GitHub’s secret management features greatly improve security, but they do not eliminate the fundamental risk of secret leakage. The design still ultimately trusts developers not to commit secrets (or to respond quickly if they do), and trusts that those with repo write access won’t abuse secrets. Attackers target the gaps in these assumptions.

5. Repository Security Design Issues

This category covers various GitHub repository behaviors that, by design, can result in unexpected exposure or persistence of data.

Forking & Data Propagation: Forks are full copies of a repository. If a private repository is forked (note: GitHub only allows forking of private repos within the same organization or to a user who has access, to prevent obvious leaks), it creates another instance of that data. A security issue arises if an org allows private forks outside its control. For example, an organization member forks an internal repo to their personal GitHub account (the repo remains private in their account). Later, that user leaves the company – but they still have that fork, which the organization can no longer monitor. In GitHub Enterprise, admins can disable forking of private repos entirely to prevent this. If not disabled, there’s a reliance on trust that departing employees will delete any private forks. This is a design choice that can lead to “data leakage” if policies aren’t enforced.
Repository Name Reuse & Ghost Repositories: When a repository is renamed or transferred, the old namespace can potentially be claimed by someone else, leading to hijacking scenarios. (check analysis of RepoJacking under Supply Chain Attacks — Package Hijacking, Repo Jacking & Namespace Reuse Attacks.)
On the notion of "ghost" repositories: sometimes you’ll see references to commits by “ghost” user – that’s when an account was deleted. A “ghost repository” could refer to a repo that is somehow detached from an active owner (perhaps via user deletion). Such repos aren’t normally accessible (GitHub would schedule them for deletion or transfer to a new owner), but any data left could be a concern. In general, dangling references can become security issues if an attacker occupies the old reference.
Unintended Data Retention: GitHub’s backend retains data for convenience and recovery. When you delete a repository, GitHub allows restoration within 90 days. This is great if the deletion was a mistake, but it means a copy of the repo lingers in GitHub’s servers. An attacker who compromises the account could restore the deleted repo and access its data (assuming they know of its existence). Similarly, GitHub Actions artifacts and logs by default persist for 90 days. They might contain sensitive info (tests output, logs that may have snippets of data). If a repository is made private or deleted, historically those artifacts might still be accessible if one had their URLs or via API with a token. GitHub has improved this (now artifacts from private repos require auth), but a concern is that data persists in multiple forms – in forks, in backups, in caches – beyond the repository itself.
Ghost Commit History: Even after a repository is made private or deleted, if someone forked it while it was public, that fork keeps the commit history. This is not a bug but inherent to Git’s distributed nature. It means sensitive data that was ever public on GitHub might survive in countless forks outside of GitHub’s control (or even on GitHub if the fork wasn’t deleted). GitHub’s search might still show it from cached indexing for a while, even after deletion. Attackers knowing this exposure will scrape this pretty quickly or some AI Agent will do it for training purposes. This is why design-wise, once a secret or sensitive file hits GitHub, assume it’s forever out there.

6. SHA1 Collision attack (a.k.a SHAttered)

GitHub, like Git itself, relies on SHA-1 hashes to identify commits, blobs, and trees. While efficient, SHA-1 is no longer cryptographically secure. In 2017, the SHAttered attack proved that two different files could share the same SHA-1 hash. In 2020, a more practical Chosen-Prefix Collision attack showed how two Git commits could have identical SHA-1 hashes but different contents—opening the door to stealthy repository tampering. GitHub proactively implemented collision detection mechanisms to block known attack patterns, and no real-world Git-based collision attacks have been seen on GitHub since. However, most repositories still use SHA-1, and the risks are far from gone.

SHA-1 collisions are computationally expensive today, but that won’t always be the case. Advances in quantum computing will make them easier—and cheaper. This makes SHA-1 a latent supply chain threat that could allow attackers to inject malicious commits, alter history, or bypass integrity checks. SHA-1 is a ticking time bomb in the Git ecosystem. While GitHub has implemented mitigations, the Git-based workflows requires a full transition to collision-resistant hashing algorithms like SHA-256. The risk may be theoretical today—but the landscape is evolving fast.

Credit to Cameron Ruatta for sparking a deeper dive on this attack.

Conclusion

The findings outlined in this threat model represent just a portion of the design-level security concerns inherent in GitHub. While GitHub continues to add new security features and evolve its posture, many of the core issues stem from trade-offs baked into the platform's DNA—trade-offs between openness and control, ease of use and least privilege, speed and verification.

Several important areas not covered in this analysis still warrant deeper exploration:

Enterprise Security Controls – especially around SCIM deprovisioning, and audit log fidelity.
Collaboration Surface – GitHub Discussions, Codespaces, and Projects all introduce their own trust boundaries.
GitHub Marketplace - third-party integrations pose risks similar to browser extensions, but at the repo/org level.

At some point during this research, I realized GitHub’s surface area is practically infinite. I told myself I’d stop digging—but every feature led to another rabbit hole. So no, this isn’t the end of the review. It’s just a pause.

GitHub is one of the most powerful platforms we rely on—but with that power comes complexity. If you care about the integrity of the software supply chain, keep going. GitHub isn’t insecure by accident. The trade-offs make it insecure by design—and that’s exactly why it deserves our attention.

Srajan’s Substack

Discussion about this post