Episode 221

Posted on Saturday, Mar 9, 2024
Andrei is back to discuss recent academic research into malware within the Python/PyPI ecosystem and whether it is possible to effectively combat it with open source tooling, plus we cover security updates for Unbound, libuv, node.js, the Linux kernel, libgit2 and more.

Show Notes

Overview

Andrei is back to discuss recent academic research into malware within the Python/PyPI ecosystem and whether it is possible to effectively combat it with open source tooling, plus we cover security updates for Unbound, libuv, node.js, the Linux kernel, libgit2 and more.

This week in Ubuntu Security Updates

56 unique CVEs addressed

[USN-6665-1] Unbound vulnerabilities (00:50)

[USN-6666-1] libuv vulnerability (01:16)

  • 1 CVEs addressed in Focal (20.04 LTS), Jammy (22.04 LTS), Mantic (23.10)
  • Async event handling library - used by nodejs and others - supports async handling TCP/UDP sockets, DNS resolution, file system operations etc
  • Would truncate hostnames to 256 characters before calling getaddrinfo() - but would then fail to NUL-terminate the string - as such, getaddrinfo() would read past the end of the buffer and the address that got resolved may not be the intended one - so then a remote attacker who could influence this could end up causing the application to contact a different address than expected and so perhaps access internal services etc

[USN-6667-1] Cpanel-JSON-XS vulnerability (02:21)

  • 1 CVEs addressed in Focal (20.04 LTS), Jammy (22.04 LTS)
  • Perl module for JSON serialisation
  • OOB read on crafted JSON - when parsing in relaxed mode, if JSON was malformed and missing a colon would read beyond the end of the data and so potentially could result in an info-leak or a crash

[USN-6668-1] python-openstackclient vulnerability (02:55)

  • 1 CVEs addressed in Focal (20.04 LTS), Jammy (22.04 LTS)
  • When deleting an access rule, would search for it by name - if it didn’t exist may end up returning a different rule which would then get deleted instead - changes the semantics to only allow rules to be deleted via their ID which is unique

[USN-6648-2] Linux kernel (Azure) vulnerabilities (03:23)

[USN-6651-2, USN-6651-3] Linux kernel (including StarFive) vulnerabilities (03:52)

[USN-6653-2, USN-6653-3, USN-6653-4] Linux kernel (AWS, Low Latency & GKE) vulnerabilities (04:07)

[USN-6647-2] Linux kernel (Azure) vulnerabilities (04:15)

[USN-6670-1] php-guzzlehttp-psr7 vulnerabilities (04:36)

  • 2 CVEs addressed in Focal (20.04 LTS), Jammy (22.04 LTS)
  • HTTP message library conforming the the PSR-7 specification - failed to properly account for embedded newlines in HTTP headers - classic HTTP smuggling attack vuln
  • Original fix from 2022 was found to be incomplete so additional CVE assigned for the follow-up fix

[USN-6671-1] php-nyholm-psr7 vulnerability (05:15)

  • 1 CVEs addressed in Jammy (22.04 LTS)
  • Alternative PSR-7 implementation which also suffered from the same issue

[USN-6669-1] Thunderbird vulnerabilities (05:35)

[USN-6672-1] Node.js vulnerabilities (06:03)

  • 3 CVEs addressed in Focal (20.04 LTS), Jammy (22.04 LTS), Mantic (23.10)
  • Leverages OpenSSL for cryptographic related work - failed to clear the OpenSSL error stack in when calling various routines - as such, may get false-positive errors on subsequent calls to OpenSSL from the same thread and hence DoS - so a remote attacker could provide an invalid cert which would then set this error and subsequent routines to validate certs would also appear to fail even if they were valid
  • Uses ICU for unicode handling - allows a user to specify their own ICU data via an environment variable - but node.js can run in different privilege contexts so a user could then force it to load data under their control when running with elevated privileges
  • ASN.1 encoding issue inherited from OpenSSL

[USN-6673-1] python-cryptography vulnerabilities (07:30)

  • 2 CVEs addressed in Bionic ESM (18.04 ESM), Focal (20.04 LTS), Jammy (22.04 LTS), Mantic (23.10)
  • Another issue of mishandling the OpenSSL API - in this case would not properly handle errors returned from OpenSSL when processing certificates that had incorrect padding (talked about this last week in [USN-6663-1] OpenSSL update)
  • Mishandled error case when a PKCS+12 key and certificate did not match one-another - would trigger an exception at runtime

[USN-6674-1, USN-6674-2] Django vulnerability (08:22)

  • 1 CVEs addressed in Bionic ESM (18.04 ESM), Focal (20.04 LTS), Jammy (22.04 LTS), Mantic (23.10)
  • ReDoS in Truncator template filter - if supplied an input string of all opening angle brackets <<<<<<.... then would cause exponential performance degredation

[USN-6675-1] ImageProcessing vulnerability (08:52)

  • 1 CVEs addressed in Focal (20.04 LTS), Jammy (22.04 LTS)
  • Image processing library for ruby based on ImageMagick
  • If an application allowed the user to specify the set of operations to be performed, could then be abused to get arbitrary shell command execution - internally used send() rather than public_send() which allowed access to private methods to directly execute system calls

[USN-6677-1] libde265 vulnerabilities (09:23)

[USN-6678-1] libgit2 vulnerabilities (09:50)

  • 5 CVEs addressed in Xenial ESM (16.04 ESM), Bionic ESM (18.04 ESM), Focal (20.04 LTS), Jammy (22.04 LTS), Mantic (23.10)
  • Used by various tools like cargo, gnome-builder etc
  • Fix for a possible infinite loop (CPU-based DoS) when parsing a crafted revision named simply @
  • Use-after free when handling crafted input to git_index_add
  • Mishandles equivalent filenames due to NTFS Data Streams (similar to CVE-2019-1352 - [USN-4220-1] Git vulnerabilities from Episode 56)
  • Failed to perform certificate checking when using an SSH remote via the optional libssh2 backend - which we do in Ubuntu

[USN-6649-2] Firefox regressions (10:47)

[USN-6676-1] c-ares vulnerability (10:55)

  • 1 CVEs addressed in Xenial ESM (16.04 ESM), Bionic ESM (18.04 ESM), Focal (20.04 LTS), Jammy (22.04 LTS), Mantic (23.10)
  • async DNS lookup library
  • Failed to properly handle embedded NUL characters when parsing /erc/resolv.conf /etc/hosts, /etc/nsswitch.conf or anything specifed via the HOSTALIASES environment variable - if has an embedded NUL as the first character in a new line, would then attempt to read memory prior to the start of the buffer and hence an OOB read -> crash

Goings on in Ubuntu Security Community

Andrei discusses malware detection with the Python and PyPi ecosystem (11:46)

Hey, Alex!

We will continue our journey today beyond the scope of the previous episodes. We’ve delved into the realms of network security, federated infrastructures, and vulnerability detection and assessment.

Today’s paper

Last year, the Ubuntu Security Team participated in the Linux Security Summit in Bilbao. At that time, I managed to have a discussion with Zach, who hosted a presentation at the Supply Chain Security Con entitled “Will Large-Scale Automated Scanning Stop Malware on OSS Repositories?”. I later discovered that his talk was backed by a paper that he and his colleagues from Chainguard had published.

With this in mind, today we will be examining “Bad Snakes: Understanding and Improving Python Package Index Malware Scanning”, which was published last year in ACM’s International Conference on Software Engineering.

The aim of the paper is to highlight the current state of the Python and PyPi ecosystems from a malware detection standpoint, identify the requirements for a mature malware scanner that can be integrated into PyPi, and ascertain whether the existing open-source tools meet these objectives.

Repositories. PyPi

With this in mind, let’s start by understanding the context.

Applications can be distributed through repositories. This means that the applications are packaged into a generic format and published in either managed or unmanaged repositories. Users can then install the application by querying the repositories, downloading the application in a format that they can unpack through a client, and subsequently run on their hosts.

There are numerous repositories out there. Some target specific operating systems, as is the case with Debian repositories, the Snap Store, Google Play, or the Microsoft Store. Others are designed to store packages for a specific programming language, such as PyPi, npm, and RubyGems. Firefox Add-ons and the Chrome extension store target a specific platform, namely the browser.

Another relevant characteristic when discussing repositories is the level of curation. The Ubuntu Archive is considered a curated repository of software packages because there are several trustworthy contributors able to publish software within the repository. Conversely, npm is unmanaged because any member of the open-source community can publish anything in it.

We will discuss the Python Package Index extensively, which is the de facto unmanaged repository for the Python programming language. As of the 7th of March 2024, there were 5.4 million releases for 520 thousand projects and nearly 800 thousand users. It is governed by a non-profit organisation and run by volunteers worldwide.

Supply chain attacks

Software repositories foster the dependencies of software on other pieces of software, controlled by different parties. As seen in campaigns such as the SolarWinds SUNBURST attack, this can go awry. Attackers can gain control over software in a company’s supply chain, gain initial access to their infrastructure, and exploit this advantage.

Multiple attack vectors are possible. Accounts can be hijacked. Attackers may publish packages with similar names (in a tactic known as typosquatting). They can also leverage shrink-wrapped clones, which are duplicates of existing packages, where malicious code is injected after gaining users’ trust. While covering all attack vectors is beyond the scope of this podcast episode, you can find a comprehensive taxonomy in a paper called “Taxonomy of Attacks on Open-Source Software Supply Chains”, which lists over 100 unique attack vectors.

From 2017 to 2022, the number of unique projects removed from PyPi increased rapidly: 38 in the first year, followed by 130, 60, 500, 27 thousands, and finally 12 thousands in the last year. Despite the fact that most of these were reported as malware, it’s worth noting that the impact of some of them is limited due to the lack of organic usage.

Malware analysis

These attacks can be mitigated by implementing techniques such as multi-factor authentication, software signing, update frameworks, or reproducible builds, but the most widespread method is malware analysis.

Some engines check for anomalies via static and dynamic heuristics, while others rely on signatures due to their simplicity. Once a piece of software is detected as malicious, its hash is added to a deny list that is embedded in the anti-malware engine. Each file is then hashed and the result is checked against the deny list. If the heuristics or the hash comparison identifies the file as malicious, it is either reported, blocked, or deleted depending on the strategy implemented by the anti-malware engine.

Malware analysis in PyPi

These solutions are already implemented in software repositories. In the case of PyPi, malware scanning was introduced in February 2022 with the assistance of a malware check feature in Warehouse, the application serving PyPi. However, it was disabled by the administrators two years later and ultimately removed in May 2023 due to an overload of alerts.

In addition to this technical solution, PyPi also capitalises on a form of social symbiosis. Software security companies and individuals conduct security research, reporting any discovered malware to the PyPi administrators via email. The administrators typically allocate 20 minutes per week to review these malware reports and remove any packages that can be verified as true positives. Ultimately, the reporting companies and individuals gain reputation or attention for their brands, products, and services.

Requirements

In addition to information about software repositories, supply chain attacks, malware analysis, and PyPi, the researchers also interviewed administrators from PyPi to understand their requirements for a malware analysis tool that could assist them. The three interviews, each lasting one hour, were conducted in July and August 2022 and involved only three individuals. This limited number of interviews is due to the focus on the PyPi ecosystem, where only ten people are directly involved in malware scanning activities.

When discussing requirements, the administrators desired tools with a binary outcome, which could be determined by checking if a numerical score exceeds a threshold or not. The decision should also be supported by arguments. While administrators can tolerate false negatives, they aim to reduce the rate of false positives to zero. The tool should also operate on limited resources and be easy to adopt, use and maintain.

Current tooling

But do the current solutions tick these boxes?

The researchers selected tools based on a set of criteria: analysing the code of the packages, having public detection techniques, and detection rules. Upon examining the available solutions, they found that only three could be used for evaluation in the context of their research: PyPi’s malware checks, Bandit4Mal, and OSSGadget’s OSS Detect Backdoor.

Regarding the former, it should be noted that the researchers did not match the YARA rules only against the setup files, but also against all files in the Python package. The second, Bandit4Mal, is an open-source version of Bandit that has been adapted to include multiple rules for detecting malicious patterns in the AST generated from a program’s codebase. The last, OSSGadget’s OSS Detect Backdoor, is a tool developed by Microsoft in June 2020 to perform rule-based malware detection on each file in a package.

These tools were tested against both malicious and benign Python packages. The researchers used two datasets containing 168 manually-selected malicious packages. For the benign packages, they selected 1,400 popular packages and one thousand randomly-selected benign Python packages.

For the evaluation process, they considered an alert in a malicious package to be a true positive and an alert in a benign package to be a false positive.

The true positive rate was 85% for the PyPi checks, the same for OSS Detect Backdoor and 90% for Bandit4Mal. The false positive rates ranged from 15% for the PyPi checks over the random packages, to 80% for Bandit4Mal on popular packages.

The tools ran in a time-effective manner, with a median time of around two seconds per package across all datasets. The maximum runtime was recorded for Ansible’s package, which was scanned in 26 minutes.

Despite their efficient run times, we can infer from these results that the tools are not accurate enough to meet the demands of PyPi’s administrators. The analysts may be overwhelmed by alerts for benign packages, which could interfere with their other operations.

Conclusions

And with this, we can conclude the episode of the Ubuntu Security Podcast, which details the paper “Bad Snakes: Understanding and Improving Python Package Index Malware Scanning”. We have discussed software repositories, malware analysis, and malware-related operations within PyPi. We’ve also explored the requirements that would make a new open-source Python malware scanner suitable for the PyPi administrators and evaluated how the current solutions perform.

If you come across any interesting topics that you believe should be discussed, please email us at security@ubuntu.com.

Over to you, Alex!

Resources

Get in contact