Latest Articles

25 articles
Active filters: Nature Methods × Machine learning ×







N
Nature Methods · Oct 29, 2025

Annotating the genome at single-nucleotide resolution with DNA foundation models

Genome annotation models that directly analyze DNA sequences are indispensable for modern biological research, enabling rapid and accurate identification of genes and other functional elements. Current annotation tools are typically developed for specific element classes and trained from scratch using supervised learning on datasets that are often limited in size. Here we frame the genome annotation problem as multilabel semantic segmentation and introduce a methodology for fine-tuning pretrained DNA foundation models to segment 14 different genic and regulatory elements at single-nucleotide resolution. We leverage the self-supervised pretrained model Nucleotide Transformer to develop a general segmentation model, SegmentNT, capable of processing DNA sequences up to 50-kb long and that achieves state-of-the-art performance on gene annotation, splice site and regulatory elements detection. We also integrated in our framework the foundation models Enformer and Borzoi, extending the sequence context up to 500 kb and enhancing performance on regulatory elements. Finally, we show that a SegmentNT model trained on human genomic elements generalizes to different species, and a multispecies SegmentNT model achieves strong generalization across unseen species. Our approach is readily extensible to additional models, genomic elements and species.

Genomics Machine learning Software biology





N
Nature Methods · Oct 08, 2025

Automated classification of cellular expression in multiplexed imaging data with Nimbus

Multiplexed imaging offers a powerful approach to characterize the spatial topography of tissues in both health and disease. To analyze such data, the specific combination of markers that are present in each cell must be enumerated to enable accurate phenotyping, a process that often relies on unsupervised clustering. We constructed the Pan-Multiplex (Pan-M) dataset containing 197 million distinct annotations of marker expression across 15 different cell types. We used Pan-M to create Nimbus, a deep learning model to predict marker positivity from multiplexed image data. Nimbus is a pretrained model that uses the underlying images to classify marker expression of individual cells as positive or negative across distinct cell types, from different tissues, acquired using different microscope platforms, without requiring any retraining. We demonstrate that Nimbus predictions capture the underlying staining patterns of the full diversity of markers present in Pan-M, and that Nimbus matches or exceeds the accuracy of previous approaches that must be retrained on each dataset. We then show how Nimbus predictions can be integrated with downstream clustering algorithms to robustly identify cell subtypes in image data. We have open-sourced Nimbus and Pan-M to enable community use athttps://github.com/angelolab/Nimbus-Inference.

Image processing Machine learning Software biology

N
Nature Methods · Oct 03, 2025

All-at-once RNA folding with 3D motif prediction framed by evolutionary information

Structural RNAs exhibit a vast array of recurrent short three-dimensional (3D) elements found in loop regions involving non-Watson–Crick interactions that help arrange canonical double helices into tertiary structures. Here we present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. Here we further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly noncovarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over 50 known RNA motifs (‘everything’). Motifs can appear in any nonhelical loop region (including three-way, four-way and higher junctions) (‘everywhere’). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (‘all-at-once’). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. CaCoFold-R3D is fast and easily customizable for novel motif discovery and shows promising value both as a strong input for deep learning approaches to all-atom structure prediction as well as toward guiding RNA design as drug targets for therapeutic small molecules.

Computational models Machine learning Non-coding RNAs Riboswitches biology

N
Nature Methods · Oct 02, 2025

Foundation model for efficient biological discovery in single-molecule time traces

Single-molecule fluorescence microscopy (SMFM) can reveal important biological insights. However, uncovering rare but critical intermediates often demands manual inspection of time traces and iterative ad hoc approaches. To facilitate systematic and efficient discovery from SMFM time traces, we introduce META-SiM, a transformer-based foundation model pretrained on diverse SMFM analysis tasks. META-SiM rivals best-in-class algorithms on a broad range of tasks including trace classification, segmentation, idealization and stepwise photobleaching analysis. Additionally, the model produces embeddings that encapsulate detailed information about each trace, which the web-based META-SiM Projector (https://www.simol-projector.org) casts into lower-dimensional space for efficient whole-dataset visualization, labeling, comparison and sharing. Combining this Projector with the objective metric of local Shannon entropy enables rapid identification of condition-specific behaviors, even if rare or subtle. Applying META-SiM to an existing single-molecule Förster resonance energy transfer dataset, we discover a previously undetected intermediate state in pre-mRNA splicing. META-SiM removes bottlenecks, improves objectivity and both systematizes and accelerates biological discovery in single-molecule data.

Machine learning Single-molecule biophysics Single-cell Machine Learning Structural Biology





N
Nature Methods · Sep 15, 2025

Integrating diverse experimental information to assist protein complex structure prediction by GRASP

Protein complex structure prediction is crucial for understanding of biological activities and advancing drug development. While various experimental methods can provide structural insights into protein complexes, the knowledge obtained is often sparse or approximate. A general tool is needed to integrate limited experimental information for high-throughput and accurate prediction. Here we introduce GRASP to efficiently and flexibly incorporate diverse forms of experimental information. GRASP outperforms existing tools in handling both simulated and real-world experimental restraints including those from crosslinking, covalent labeling, chemical shift perturbation and deep mutational scanning. For example, GRASP excels at predicting antigen–antibody complex structures, even surpassing AlphaFold3 when using experimental deep mutational scanning or covalent-labeling restraints. Beyond its accuracy and flexibility in restrained structure prediction, GRASP’s ability to integrate multiple forms of restraints enables integrative modeling. We also showcase its potential in modeling protein structural interactome under near-cellular conditions using previously reported large-scale in situ crosslinking data for mitochondria.

Cryoelectron microscopy Machine learning Protein structure predictions Solution-state NMR Structural Biology Proteomics Machine Learning Drug Development

N
Nature Methods · Sep 15, 2025

Scaling up spatial transcriptomics for large-sized tissues: uncovering cellular-level tissue architecture beyond conventional platforms with iSCALE

Recent advances in spatial transcriptomics (ST) technologies have transformed our ability to profile gene expression while preserving crucial spatial context within tissues. However, existing ST platforms are constrained by high costs, long turnaround times, low resolution, limited gene coverage and inherently small tissue capture areas, which hinder their broad applications. Here we present iSCALE, a method that reconstructs large-scale, super-resolution gene expression landscapes and automatically annotates cellular-level tissue architecture in samples exceeding capture areas of current ST platforms. The performance of iSCALE was assessed by comprehensive evaluations involving benchmarking experiments, immunohistochemistry staining and manual annotations by pathologists. When applied to multiple sclerosis human brain samples, iSCALE uncovered lesion-associated cellular characteristics undetectable by conventional ST experiments. Our results demonstrate the utility of iSCALE in analyzing large tissues by enabling unbiased annotation, resolving cell type composition, mapping cellular microenvironments and revealing spatial features beyond the reach of standard ST analysis or routine histopathological assessment.

Gene expression analysis Machine learning RNA sequencing Transcriptomics Neuroscience Single-cell Genomics Human Machine Learning