top of page

Biothreat Benchmark Generation (BBG) Framework
For Evaluating Frontier AI Models

Research Sponsored by the AI Safety Fund

Project Summary 

Nemesys Insights, in conjunction with Frontier Design Group and Biosecurity Services, has developed the Bacterial Biothreat Benchmark Generation (BBG) Framework, a novel assessment system aligned with biosecurity threat chains, to improve, advance and strengthen the ability of biothreat benchmarks to provide safety assessments of AI models. The goal of the framework is to enable the identification of key areas of potential harm along the biothreat chain, to allow for the prioritization of mitigation efforts and to enhance safety beyond prior benchmarking approaches​.

​

Creation of the BBG Framework included the development of a conceptual architecture for biothreats and then the use of three different approaches – web-mediated prompt generation, extraction from existing corpora, and asynchronous dynamics red teaming – to generate a set of benchmarks that are both aligned to the biosecurity threat chain and diagnostic in the sense of providing uplift over traditional search tools.

​

The final framework and benchmarks contain over 1,000 bacterial biothreat prompts and their location in the biothreat chain, complete with implementation tools and tutorials, as well as the associated research publications, and risk mitigation recommendations. Public versions of the benchmark dataset and implementation tool, as well as associated research publications, can be found below.

​

Public versions of the benchmark dataset and implementation tool, as well as associated research publications, can be found below. Please note that, for safety reasons, certain benchmarks, as well as some of the threat-linked functionality, are not available in the publicly available versions posted here. The full set of benchmarks and functions, as described in the accompanying publications, is available upon request to AI developers, government agencies and researchers engaged in responsible AI activities. If you would like access to the restricted versions of the benchmarks, please contact us.

Research Publications

Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models, I: The Task-Query Architecture

Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models, II: Benchmark Generation Process

Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models, III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset

Data and Associated Products
Bacterial Biothreat Benchmark Datasets

Bacterial Biothreat Benchmark Prepopulated [Public]

Bacterial Biothreat Benchmark Agent Agnostic [Public]

Bacterial Biothreat Benchmark Agent & Location Agnostic [Public]

Benchmark Evaluation Tools
Supporting Documention
Wavy Abstract Background

Let's Connect

Ready to take your organization's strategic and competitive analyses to the next level?

Reach out to us today to learn more about how our innovative red teaming, forecasting and analytical approaches can keep you one step ahead of your competition.

bottom of page