RNA aptamers are short nucleic acid sequences capable of binding to specific molecular targets with high affinity and selectivity. Among the most extensively studied of these is the theophylline aptamer—a 33-nucleotide RNA sequence that exhibits extraordinary specificity for the small molecule theophylline, distinguishing it from caffeine with over 10,000-fold selectivity, despite their near-identical structures. This makes the theophylline aptamer a compelling model for RNA-based recognition, biosensing, and molecular switch design.
The theophylline aptamer binds the small molecule theophylline (a methylxanthine drug) with high affinity (~0.4 µM) and remarkable specificity. Its binding pocket and tertiary structure have been well-characterized by NMR and crystallography. Such an RNA–ligand pair is an ideal test case for Boltz-1 (“Boltz”), an open-source deep learning model for 3D biomolecular complex prediction.
The importance of modeling such RNA–ligand interactions cannot be overstated. Aptamers are increasingly employed in diagnostics, synthetic biology, and therapeutic delivery systems. A high-confidence structural model of the aptamer–ligand complex enables a deeper understanding of molecular recognition mechanisms, supports rational design of RNA devices, and guides the development of aptamer analogs with tailored properties.
Traditionally, elucidating RNA tertiary structures has relied on time-consuming experimental techniques like NMR and X-ray crystallography. However, these methods are resource-intensive and limited in throughput. Enter Boltz-1—a powerful, open-source deep learning model capable of predicting 3D biomolecular structures involving proteins, RNAs, and small molecules. Trained to capture atomic-level interactions, Boltz-1 can model RNA–ligand complexes de novo, without requiring prior knowledge of tertiary folds or user-defined pockets.
In this guide, we demonstrate step-by-step how to reproduce a Boltz-1 prediction for the theophylline aptamer–theophylline complex. We will prepare the input sequences (RNA and ligand), run the Boltz model, and examine the output including confidence scores. This example is grounded in the experimentally resolved structure (PDB ID 1O15) reported by Clore & Kuszewski (2003) and serves as a benchmark for Boltz’s capabilities in RNA-targeted small molecule modeling. All required inputs and commands are provided so that you can follow along.
Boltz-1 accepts inputs either in a single FASTA file or a YAML configuration. For clarity, we’ll use the YAML format, which explicitly labels each chain and ligand. We need two pieces of information: the RNA sequence and the ligand structure.
Next, we create the YAML input file (e.g. theophylline_aptamer.yam
l ) with the below information:
version: 1 sequences: - rna: id: [A] sequence: GGCGAUACCAGCCGAAAGGCCCUUGGCAGCGUCUU - ligand: id: [B] smiles: "CN1c2c(c(=O)n(c1=O)C)[nH]cn2"
To run Boltz-1 on DiPhyx, you first need to create a compute unit. For this example, we’ll use an AWS g4dn.xlarge instance. Follow the DiPhyx guide for detailed instructions on setting up a cloud compute unit.
If you prefer to run the model locally or on an in-house device, refer to these guides for creating a self-hosted compute unit:
Once the compute unit is in the READY
state, proceed to the next step.
+ New Project
and search for Boltz-1
Create New Project
dialog, select an execution method:This method provides an interface to configure and run the model using predefined parameters.
theophylline_aptamer.yaml
file by clicking the folder icon in the Input file or directory
section.
/volume/boltz_inputs/theophylline_aptamer.yaml
. If no specific file is mentioned, Boltz will process all YAML files in the /volume/boltz_inputs/
directory--recycling_steps
):--diffusion_samples
):--cache
):/volume/cache
--out_dir
) where results will be saved.If you make any mistakes during setup, you can correct them in the next step.
This advanced method allows you to upload or create a custom script for running the model. Ensure that the input files are uploaded beforehand. Below is an example script:
#!/bin/bash # Set precision for matrix operations export TORCH_FORCE_FLOAT32_MATMUL_PRECISION=medium # Run Boltz-1 prediction boltz predict /volume/boltz_inputs/theophylline_aptamer.yaml \ --use_msa_server \ --out_dir /volume/boltz_output/ \ --cache /volume/cache \ --recycling_steps 10 \ --diffusion_samples 5
Upload this script and execute it to run the prediction.
If this is your first time running this workflow on the compute unit, it may take 1–2 minutes for the model to be pulled and initialized. Once the model is in the Created
state, click on it, then select the three dots in the upper-left corner and choose Start
. While the model is running, you can monitor the logs, resource usage (CPU, memory, disk), and view the output files by clicking the Browse
button in the Volume section.
If you need to modify any parameters or arguments, ensure the workflow is not in the Running
state. Navigate to the Command
section, click the modification button, and make the necessary changes to the command.
Once the project is fully configured, start the prediction. Boltz-1 will automatically detect the RNA and ligand from the input YAML file, perform the structure prediction, and save the results in the designated output directory.
Inside the output directory, Boltz organizes results by input filename. For our theophylline_aptamer
case, expect a folder like /volume/boltz_output/predictions/theophylline_aptamer/
. Key files inside include:
theophylline_aptamer_model_0.cif
: predicted 3D structureconfidence_theophylline_aptamer_model_0.json
: confidence metricspae_...npz
, plddt_...npz
: optional confidence arraysYou can visualize the predicted CIF file in PyMOL or Paraview. Boltz-1 should model the conserved tertiary fold and ligand placement of the theophylline-binding aptamer. Use known PDB entries like 1O15 to compare predictions with experimental data.
We strongly suggest adding an IDE to your project for debugging and file management. Popular choices include:
These tools provide a user-friendly interface for editing scripts, managing files, and troubleshooting issues directly within your compute unit environment.