A 15-year-old vulnerability in the open source Python programming language is still finding its way into live code, with the result that over 350,000 projects are at risk of potential supply chain cyber attacks, according to threat researchers at Trellix, the recently spun-up union of FireEye and McAfee.
CVE-2007-4559 is a directory traversal vulnerability in the “extract” and “extractall” functions in Python’s tarfile module. When exploited, it allows a user-assisted remote attacker to overwrite arbitrary files via a specific sequence in filenames in a TAR archive, ultimately achieving arbitrary code execution or control of the target device.
When it first emerged in October 2007 – a month before the first-generation iPhone hit UK stores – Red Hat deemed the vulnerability of low importance. However, according to Trellix’s threat researchers, it is still widespread in frameworks created by the likes of Amazon Web Services, Google, Intel and Netflix, as well as in multiple other applications used for machine learning, automation and Docker containerisation.
“When we talk about supply chain threats, we typically refer to cyber attacks like the SolarWinds incident. However, building on top of weak code foundations can have an equally severe impact,” said Christiaan Beek, head of adversarial and vulnerability research at Trellix.
“This vulnerability’s pervasiveness is furthered by industry tutorials and online materials propagating its incorrect usage. It’s critical for developers to be educated on all layers of the technology stack to properly prevent the reintroduction of past attack surfaces.”
Trellix’s principal engineer and director of vulnerability research, Doug McKee, said the research team stumbled across CVE-2007-4559 in an undisclosed environment somewhat by accident, having been investigating an unrelated issue. At first, he explained, the team thought they had found a new zero-day, but digging deeper, found this was not the case.
Christiaan Beek, Trellix
“When we started pulling on the proverbial thread, we couldn’t believe what unravelled,” said McKee. “With standard public access to GitHub we were able to find over 300,000 files that contained Python’s tarfile module and an average of 61% were vulnerable to an attack as result of CVE-2007-4559 in 2022.”
Trellix contacted GitHub to try to understand the issue better. Working together, the two were able to determine there were approximately 2.87 million open source files containing the Python tarfile module in 588,000 unique repositories spanning a vast number of sectors.
“There is no one party, organisation or person to blame for the current state of CVE-2007-4559, but here we are anyway,” said McKee.
“We need to start by considering that open source projects like the Python project are often run and maintained by a group of volunteers. In this case, Python is run and owned by the Python Software Foundation (PSF), which is a non-profit organisation. It is often harder for these types of groups to obtain resources, perform vigorous reviews, make unilateral decisions, and track and therefore fix these types of issues in a timely manner.”
He continued: “In cases like this, there is often also a debate on if there are legitimate use cases for the behaviour of a module. We have seen the argument, including in this case, that just because an aspect of a function could be used for a malicious purpose does it mean its ultimately needs to be removed. Should we remove the streetlamp because someone could push you into it? In this instance, I believe the risk outweighs the reward on accommodating a few corner cases.”
Trellix is now working to push code via GitHub pull request to protect open source projects from CVE-2007-4559, and will be making available a free tool for developers to use to check if their applications are at risk. This can be found on Trellix’s GitHub page.
The publication of Trellix’s research on the Python tarfile vulnerability also marks the opening of its Advanced Research Centre for global threat intelligence, bringing together hundreds of expert cyber analysts and researchers to produce actionable real-time intelligence and threat indicators. It will also serve as the driving force behind the company’s flagship extended detection and response (XDR) platform.
Aparna Rayasam, Trellix chief product officer, said: “The threat landscape is scaling in sophistication and potential for impact. We do this work to make our digital and physical worlds safer for everyone. With adversaries strategically investing in talent and technical know-how, the industry has a duty to study the most combative actors and their methods to innovate at a faster rate.”