If you’ve referenced open-source software in your software development process, you’ve relied on code published to public repositories as “packages”. These packages can be an attack vector when a malicious character uses Dependency Substitution or Typosquatting to ceate a supply chain attack. Those packages are downloaded and installed by a package manager. Each programming language has its own set of package managers, e.g. Python Package Index (PyPi) for Python, npm for JavaScript, RubyGems for Ruby, NuGet for .NET, etc.
Open source is open to supply chain attacks
The spirit of open source collaboration and contribution is definitely awesome for creating new applications, but it is this very trust and openness that leaves this crucial part of the software supply chain susceptible to attacks by malicious actors. There are many different ways those bad actors try to exploit the vast network of open source packages. Today, we’ll cover two major attack vectors: dependency confusion/substitution and combo/typo-squatting.
Let’s say you work at a big company, and your software relies on the installation of many internal packages. In theory, if these packages are hosted privately and only vetted and trusted people can contribute or publish code, you should be pretty safe. The problem is, many projects rely on internal packages that also exist in public registries, and a package manager can be duped into installing that malicious version. This can happen not only if the names are identical, but even if they are close enough–as in, the malicious package might be a typo’d version of the legitimate package. As an example–the npm package ‘loadsh’ was exploiting the misspelling of the common package ‘lodash’. These attacks are called supply chain attacks.
How big of a problem is this? And who might it affect? In 2016, a German university student managed to get various branches of the US government and military (!) to download his packages–none of which had anything actually malicious, but which proved just how far the reach of such attacks might be. More recently, cybersecurity researcher Alex Birsan recently managed to hack Apple, Tesla, Microsoft, and a host of other major corporations using similar techniques. Not too long after that, both the npm and PyPi indices were flooded with thousands of dependency copycats, leading the admins* to delete thousands of offending packages. Thankfully (again), most of these packages were not actually malicious–at first. Eventually, malicious payloads that attempted to steal passwords from companies such as Amazon, Slack, Lyft, and Zillow showed up. The companies denied any harm being done to them or their customers, but well, you get the point.
So what exactly is happening behind the scenes with the package managers? Let’s take Python’s pip package manager as an example–when you execute the ‘pip install somecoolpackage’ command, where, exactly, is the package (and its dependencies) being downloaded from? In pip, there are two CLI flags that control this — the standard “index-url” flag and the “extra-index-url” flag. The former gets your package from PyPi (and its mirrors), while the latter gets it, presumably, from your private, internal URL. Allowing multiple sources for downloads is a common feature in package managers. Unfortunately, the way that pip is designed means you cannot strongly prioritize–or force–pip to install internally, which means there is still a very real risk that you’ll install a malicious version from a public mirror. Also versioning matters! By default, pip tends to install the latest version available of any package, which means that if someone manages to publish ‘somecoolpackage version 100000000.1’ publicly, pip will prefer that one over any version stored privately. This isn’t just a Python thing either; this whole issue of priority confusion stemming from multiple sources also affects npm/Java, gem/Ruby, PHP, and more.
So, what can be done about this? How can you protect yourself from a supply chain attack? Sure, you could try to insulate yourself from the outside world by setting up all your packages in a private index, e.g. deploying a private PyPi server using your own (trusted) storage backend. However, this is not necessarily a tenable (or practical) solution for every person or workplace.
There is a simple and effective way that SOOS SCA can help keep your code and assets safe from these forms of attack. Before you scan your manifest files, you can enter in regular expression patterns for your internal package names. This functions like a mask–if your manifest shows any packages/dependencies that match your internal naming conventions, we will warn you that these might be dependency substitution type attacks. Whether you’re working with a lone GitHub repository or have an entire team using CI/CD platforms such as AzureDevOps, Jenkins, TeamCity, or Travis, SOOS can continually monitor your manifests and fail your builds if the scan finds any potential impostor packages. Security should never be an afterthought – SOOS SCA tool scans have the advantage of being automated and baked into your CI/CD process, so you can focus more on what you do best–developing great software.
Conclusion
The bottom line is that there are bad actors out there actively trying to sneak their way into companies’ build and deploy pipelines (supply chain attack). Cyber security is a multi-faceted problem, but one of the best ways to mitigate risk is by using tools to do most of the heavy lifting. Check out SOOS SCA for an affordable solution that can help prevent dependency and typo attacks.
*Side note: some researchers have estimated that there are over 400,000 package owners in PyPi, but only about 10 people with admin or moderator rights–that is, only 10 people that can remove packages or ban projects. That’s a 40,000 owner to admin ratio! So yeah, it might take some time before they get around to it…