Modeling an RNA–Ligand Complex with Boltz-1

RNA aptamers are short nucleic acid sequences capable of binding to specific molecular targets with high affinity and selectivity. Among the most extensively studied of these is the theophylline aptamer—a 33-nucleotide RNA sequence that exhibits extraordinary specificity for the small molecule theophylline, distinguishing it from caffeine with over 10,000-fold selectivity, despite their near-identical structures. This makes the theophylline aptamer a compelling model for RNA-based recognition, biosensing, and molecular switch design.

The theophylline aptamer binds the small molecule theophylline (a methylxanthine drug) with high affinity (~0.4 µM) and remarkable specificity. Its binding pocket and tertiary structure have been well-characterized by NMR and crystallography. Such an RNA–ligand pair is an ideal test case for Boltz-1 (“Boltz”), an open-source deep learning model for 3D biomolecular complex prediction.

The importance of modeling such RNA–ligand interactions cannot be overstated. Aptamers are increasingly employed in diagnostics, synthetic biology, and therapeutic delivery systems. A high-confidence structural model of the aptamer–ligand complex enables a deeper understanding of molecular recognition mechanisms, supports rational design of RNA devices, and guides the development of aptamer analogs with tailored properties.

Traditionally, elucidating RNA tertiary structures has relied on time-consuming experimental techniques like NMR and X-ray crystallography. However, these methods are resource-intensive and limited in throughput. Enter Boltz-1—a powerful, open-source deep learning model capable of predicting 3D biomolecular structures involving proteins, RNAs, and small molecules. Trained to capture atomic-level interactions, Boltz-1 can model RNA–ligand complexes de novo, without requiring prior knowledge of tertiary folds or user-defined pockets.

In this guide, we demonstrate step-by-step how to reproduce a Boltz-1 prediction for the theophylline aptamer–theophylline complex. We will prepare the input sequences (RNA and ligand), run the Boltz model, and examine the output including confidence scores. This example is grounded in the experimentally resolved structure (PDB ID 1O15) reported by Clore & Kuszewski (2003) and serves as a benchmark for Boltz’s capabilities in RNA-targeted small molecule modeling. All required inputs and commands are provided so that you can follow along.

1. Preparing the Input Files

Boltz-1 accepts inputs either in a single FASTA file or a YAML configuration. For clarity, we’ll use the YAML format, which explicitly labels each chain and ligand. We need two pieces of information: the RNA sequence and the ligand structure.

RNA Sequence: The 33-nucleotide RNA sequence of the theophylline-binding aptamer. We take the consensus aptamer sequence (including stabilizing stems and loop) as reported in the literature. In FASTA notation, we would mark this as an RNA chain. In YAML, we will specify it under `sequences` as an `RNA` entry.

Ligand Structure: The ligand is theophylline (chemical formula C_7H_8N_4O_2). We can provide it to Boltz by its SMILES string. The SMILES for

Next, we create the YAML input file (e.g. theophylline_aptamer.yaml ) with the below information:

2. Running the Boltz-1 Prediction

2.1 Setting Up a Compute Unit

To run Boltz-1 on DiPhyx, you first need to create a compute unit. For this example, we’ll use an AWS g4dn.xlarge instance. Follow the DiPhyx guide for detailed instructions on setting up a cloud compute unit.

If you prefer to run the model locally or on an in-house device, refer to these guides for creating a self-hosted compute unit:

2.2 Creating a Boltz-1 Project

Navigate to your compute unit and click on its name.
On the compute unit page, click + New Project and search for Boltz-1
In the Create New Project dialog, select an execution method:

Parameter Mode

This method provides an interface to configure and run the model using predefined parameters.

Upload the theophylline_aptamer.yaml file by clicking the folder icon in the Input file or directory section.
Set the input path to /volume/boltz_inputs/theophylline_aptamer.yaml. If no specific file is mentioned, Boltz will process all YAML files in the /volume/boltz_inputs/ directory
Configure the following parameters:

Recycle Steps (--recycling_steps):
- Value: 20
- Purpose: Refines predictions through iterative cycles.
- Recommendation: Increase to 10–20 for small systems like this aptamer to improve accuracy. Test lower values first to avoid GPU memory issues.
Diffusion Samples (--diffusion_samples):
- Value: 5
- Purpose: Runs multiple independent predictions and ranks them by confidence.
- Recommendation: Strongly recommended to capture conformational variability and improve reliability.
Cache Directory (--cache):
- Value: /volume/cache
- Purpose: Stores preprocessed data to speed up subsequent runs.
- Recommendation: Use this to save time, especially for batch jobs or reruns.

Specify the output directory (--out_dir) where results will be saved.

Script Mode

This advanced method allows you to upload or create a custom script for running the model. Ensure that the input files are uploaded beforehand. Below is an example script:

2.3 Running the Model

If this is your first time running this workflow on the compute unit, it may take 1–2 minutes for the model to be pulled and initialized. Once the model is in the Created state, click on it, then select the three dots in the upper-left corner and choose Start. While the model is running, you can monitor the logs, resource usage (CPU, memory, disk), and view the output files by clicking the Browse button in the Volume section.

If you need to modify any parameters or arguments, ensure the workflow is not in the Running state. Navigate to the Command section, click the modification button, and make the necessary changes to the command.

Once the project is fully configured, start the prediction. Boltz-1 will automatically detect the RNA and ligand from the input YAML file, perform the structure prediction, and save the results in the designated output directory.

3. Output Files and Results

Inside the output directory, Boltz organizes results by input filename. For our theophylline_aptamer case, expect a folder like /volume/boltz_output/predictions/theophylline_aptamer/. Key files inside include:

theophylline_aptamer_model_0.cif: predicted 3D structure
confidence_theophylline_aptamer_model_0.json: confidence metrics
pae_...npz , plddt_...npz : optional confidence arrays

4. Examining the Predicted Complex

You can visualize the predicted CIF file in PyMOL or Paraview. Boltz-1 should model the conserved tertiary fold and ligand placement of the theophylline-binding aptamer. Use known PDB entries like 1O15 to compare predictions with experimental data.

Tip: Enhancing Your Workflow

We strongly suggest adding an IDE to your project for debugging and file management. Popular choices include:

These tools provide a user-friendly interface for editing scripts, managing files, and troubleshooting issues directly within your compute unit environment.

Modeling an RNA–Ligand Complex with Boltz-1: The Theophylline Aptamer Case

The Theophylline Aptamer and Boltz-1