Daily Picks from Nature, Cell, Science, PNAS

N

Nature Methods · Nov 24, 2025

TIRTL-seq: deep, quantitative and affordable paired TCR repertoire sequencing

The specificity of T cells is determined by T cell receptor (TCR) α and β chain sequences. While bulk TCR sequencing enables cost-effective repertoire profiling without chain pairing information, single-cell approaches provide paired data but are costly and limited in throughput. Here we present throughput-intensive rapid TCR library sequencing (TIRTL-seq), an experimental and computational methodology for paired TCR repertoire sequencing (TCR-seq). TIRTL-seq is based on the parallel generation of hundreds of TCR libraries in 384-well plates at less than US$200 per plate, allowing cohort-scale paired TCR-seq studies. We benchmarked TIRTL-seq against state-of-the-art bulk TCR-seq and 10x Genomics Chromium technologies on longitudinal samples and identified severe acute respiratory syndrome coronavirus 2- and Epstein–Barr virus-specific clonal expansions after infection with distinct dynamics. TIRTL-seq offers a universal protocol scalable from a single cell to millions of T cells per sample, simultaneously delivering both precise clonal frequency estimation and accurate TCR chain pairing, combining the strengths of bulk and single-cell TCR-seq. TIRTL-seq is a high-throughput method for paired T cell receptor sequencing at the cohort scale.

Adaptive immunity Immunological techniques Sequencing Software Systems biology biology

TIRTL-seq: deep, quantitative and affordable pair…

N

Nature Methods · Nov 18, 2025

ImmunoMatch learns and predicts cognate pairing of heavy and light immunoglobulin chains

The development of stable antibodies formed by compatible heavy (H) and light (L) chain pairs is crucial in both in vivo maturation of antibody-producing cells and ex vivo designs of therapeutic antibodies. We present ImmunoMatch, a machine-learning framework trained on paired H and L sequences from human B cells to identify molecular features underlying chain compatibility. ImmunoMatch distinguishes cognate from random H–L pairs and captures differences associated withκandλlight chains, reflecting B cell selection mechanisms in the bone marrow. We apply ImmunoMatch to reconstruct paired antibodies from spatial VDJ sequencing data and study the refinement of H–L pairing across B cell maturation stages in health and disease. We find further that ImmunoMatch is sensitive to sequence differences at the H–L interface. These insights provide a computational lens into the broader biological principles governing antibody assembly and stability.

Adaptive immunity Lymphocytes Machine learning Software biology

ImmunoMatch learns and predicts cognate pairing o…

N

Nature Methods · Nov 13, 2025

Bin Chicken: targeted metagenomic coassembly for the efficient recovery of novel genomes

The recovery of microbial genomes from metagenomic datasets has provided genomic representation for hundreds of thousands of species from diverse biomes. However, low-abundance microorganisms are often missed due to insufficient genomic coverage. Here we present Bin Chicken, an algorithm that substantially improves genome recovery through automated, targeted selection of metagenomes for coassembly based on shared marker gene sequences derived from raw reads. Marker gene sequences that are divergent from known reference genomes can be further prioritized, providing an efficient means of recovering highly novel genomes. Applying Bin Chicken to public metagenomes and coassembling 800 sample groups recovered 77,562 microbial genomes, including the first genomic representatives of 6 phyla, 41 classes and 24,028 species. These genomes expand the genomic tree of life and uncover a wealth of novel microbial lineages for further research.

Data mining Genome informatics Metagenomics Microbial genetics Software biology

Bin Chicken: targeted metagenomic coassembly for …

N

Nature Methods · Nov 11, 2025

Universal consensus 3D segmentation of cells from 2D segmented stacks

Cell segmentation is the foundation of a wide range of microscopy-based biological studies. Deep learning has revolutionized two-dimensional (2D) cell segmentation, enabling generalized solutions across cell types and imaging modalities. This has been driven by the ease of scaling up image acquisition, annotation and computation. However, three-dimensional (3D) cell segmentation, requiring dense annotation of 2D slices, still poses substantial challenges. Manual labeling of 3D cells to train broadly applicable segmentation models is prohibitive. Even in high-contrast images annotation is ambiguous and time-consuming. Here we develop a theory and toolbox, u-Segment3D, for 2D-to-3D segmentation, compatible with any 2D method generating pixel-based instance cell masks. u-Segment3D translates and enhances 2D instance segmentations to a 3D consensus instance segmentation without training data, as demonstrated on 11 real-life datasets, comprising >70,000 cells, spanning single cells, cell aggregates and tissue. Moreover, u-Segment3D is competitive with native 3D segmentation, even exceeding when cells are crowded and have complex morphologies.

Cellular imaging Image processing Machine learning Software biology

Universal consensus 3D segmentation of cells from…

N

Nature Methods · Nov 07, 2025

Monod: model-based discovery and integration through fitting stochastic transcriptional dynamics to single-cell sequencing data

Single-cell RNA sequencing analysis centers on illuminating cell diversity and understanding the transcriptional mechanisms underlying cellular function. These datasets are large, noisy and complex. Current analyses prioritize noise removal and dimensionality reduction to tackle these challenges and extract biological insight. We propose an alternative, physical approach to leverage the stochasticity, size and multimodal nature of these data to explicitly distinguish their biological and technical facets while revealing the underlying regulatory processes. With the Python package Monod, we demonstrate how nascent and mature RNA counts, present in most published datasets, can be meaningfully ‘integrated’ under biophysical models of transcription. By using variation in these modalities, we can identify transcriptional modulation not discernible through changes in average gene expression, quantitatively compare mechanistic hypotheses of gene regulation, analyze transcriptional data from different technologies within a common framework and minimize the use of opaque or distortive normalization and transformation techniques.

Computational biophysics Computational models Software Transcriptomics Single-cell Genomics Cell Biology Machine Learning

Monod: model-based discovery and integration thro…

N

Nature Methods · Nov 03, 2025

STORIES: learning cell fate landscapes from spatial transcriptomics using optimal transport

In dynamic biological processes such as development, spatial transcriptomics is revolutionizing the study of the mechanisms underlying spatial organization within tissues. Inferring cell fate trajectories from spatial transcriptomics profiled at several time points has thus emerged as a critical goal, requiring novel computational methods. Wasserstein gradient flow learning is a promising framework for analyzing sequencing data across time, built around a neural network representing the differentiation potential. However, existing gradient flow learning methods face challenges in analyzing spatially resolved transcriptomic data. Here, we propose STORIES, a method that uses an extension of Optimal Transport to learn a spatially informed potential. We benchmark our approach using three large Stereo-seq spatiotemporal atlases and demonstrate superior spatial coherence compared to existing approaches. Finally, we provide an in-depth analysis of axolotl neural regeneration and mouse gliogenesis, recovering gene trends for known markers such asNptx1in neuron regeneration andAldh1l1in gliogenesis and additional putative drivers.

Computational models Differentiation Software Transcriptomics biology mouse experiments

STORIES: learning cell fate landscapes from spati…

N

Nature Methods · Oct 30, 2025

Nicheformer: a foundation model for single-cell and spatial omics

Tissue makeup depends on the local cellular microenvironment. Spatial single-cell genomics enables scalable and unbiased interrogation of these interactions. Here we introduce Nicheformer, a transformer-based foundation model trained on both human and mouse dissociated single-cell and targeted spatial transcriptomics data. Pretrained on SpatialCorpus-110M, a curated collection of over 57 million dissociated and 53 million spatially resolved cells across 73 tissues on cellular reconstruction, Nicheformer learns cell representations that capture spatial context. It excels in linear-probing and fine-tuning scenarios for a newly designed set of downstream tasks, in particular spatial composition prediction and spatial label prediction. Critically, we show that models trained only on dissociated data fail to recover the complexity of spatial microenvironments, underscoring the need for multiscale integration. Nicheformer enables the prediction of the spatial context of dissociated cells, allowing the transfer of rich spatial information to scRNA-seq datasets. Overall, Nicheformer sets the stage for the next generation of machine-learning models in spatial single-cell analysis.

Computational models Machine learning Software Transcriptomics biology mouse experiments

Nicheformer: a foundation model for single-cell a…

N

Nature Methods · Oct 29, 2025

Annotating the genome at single-nucleotide resolution with DNA foundation models

Genome annotation models that directly analyze DNA sequences are indispensable for modern biological research, enabling rapid and accurate identification of genes and other functional elements. Current annotation tools are typically developed for specific element classes and trained from scratch using supervised learning on datasets that are often limited in size. Here we frame the genome annotation problem as multilabel semantic segmentation and introduce a methodology for fine-tuning pretrained DNA foundation models to segment 14 different genic and regulatory elements at single-nucleotide resolution. We leverage the self-supervised pretrained model Nucleotide Transformer to develop a general segmentation model, SegmentNT, capable of processing DNA sequences up to 50-kb long and that achieves state-of-the-art performance on gene annotation, splice site and regulatory elements detection. We also integrated in our framework the foundation models Enformer and Borzoi, extending the sequence context up to 500 kb and enhancing performance on regulatory elements. Finally, we show that a SegmentNT model trained on human genomic elements generalizes to different species, and a multispecies SegmentNT model achieves strong generalization across unseen species. Our approach is readily extensible to additional models, genomic elements and species.

Genomics Machine learning Software biology

Annotating the genome at single-nucleotide resolu…

N

Nature Methods · Oct 27, 2025

Improved reconstruction of single-cell developmental potential with CytoTRACE 2

While single-cell RNA sequencing has advanced our understanding of cell fate, identifying molecular hallmarks of potency—a cell’s ability to differentiate into other cell types—remains a challenge. Here we introduce CytoTRACE 2, an interpretable deep learning framework for predicting absolute developmental potential from single-cell RNA sequencing data. Across diverse platforms and tissues, CytoTRACE 2 outperformed previous methods in predicting developmental hierarchies, enabling detailed mapping of single-cell differentiation landscapes and expanding insights into cell potency.

Cancer genomics Machine learning Software Stem cells Transcriptomics biology

Improved reconstruction of single-cell developmen…

N

Nature Methods · Oct 22, 2025

scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution

Understanding how regulatory sequences shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA sequencing and epigenomic profiling provides opportunities to build models capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multimodal technologies. Here, we introduce scooby, a framework to model genomic profiles of single-cell RNA-sequencing coverage and single-cell assay for transposase-accessible chromatin using sequencing insertions from sequence at single-cell resolution. For this, we leverage the pretrained multiomics profile predictor Borzoi and equip it with a cell-specific decoder. Scooby recapitulates cell-specific expression levels of held-out genes and identifies regulators and their putative target genes. Moreover, scooby allows resolving single-cell effects of bulk expression quantitative trait loci and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.

Computational models Machine learning Software Transcriptomics biology

scooby: modeling multimodal genomic profiles from…

N

Nature Methods · Oct 20, 2025

CELLECT: contrastive embedding learning for large-scale efficient cell tracking

Quantitative analysis of large-scale cellular behaviors plays an increasingly crucial role in understanding mechanisms of diverse physiopathological processes, but achieving cell tracking with both high performance and efficiency in practical applications remains a challenge. Here we introduce CELLECT, a contrastive embedding learning method for large-scale efficient cell tracking, and demonstrate it on theCaenorhabditis elegansdataset in the Cell Tracking Challenge. By contrastive learning of latent embeddings of diverse cellular structures, a CELLECT model pretrained on a single public dataset can be effectively applied across different imaging modalities and species with broad generalization. Using advanced two-photon imaging, CELLECT enables real-time 3D tracking of large-scale B cells with frequent divisions during germinal center formation in a mouse lymph node, quantitative identification of cell–bacterium interactions in the mouse spleen and high-fidelity extraction of neural signals during strong nonrigid motions. We believe that these results demonstrate broad applications of CELLECT in immunology, pathology and neuroscience.

Fluorescence imaging Lymphocytes Software Systems biology biology mouse experiments

CELLECT: contrastive embedding learning for large…

N

Nature Methods · Oct 15, 2025

gReLU: a comprehensive framework for DNA sequence modeling and design

Deep learning models trained on DNA sequences can predict cell-type-specific regulatory activity, reveal cis-regulatory grammar, prioritize genetic variants and design synthetic DNA. However, building and interpreting these models correctly remains difficult, and models and software built by different groups are often not interoperable. Here we present gReLU, a comprehensive software framework that enables advanced sequence modeling pipelines, including data preprocessing, modeling, evaluation, interpretation, variant effect prediction and regulatory element design. gReLU advances deep-learning-based modeling and analysis of DNA sequences with comprehensive toolsets and versatile applications.

Genomics Machine learning Software Genetics Machine Learning Genomics Human

gReLU: a comprehensive framework for DNA sequence…

N

Nature Methods · Oct 13, 2025

Deep generative modeling of sample-level heterogeneity in single-cell genomics

Single-cell genomic studies were recently conducted on hundred of samples exhibiting complex designs. These data have tremendous potential for discovering how sample- or tissue-level phenotypes relate to cellular and molecular composition. However, current analyses are often based on simplified representations of these data by averaging information across cells. We present multi-resolution variational inference (MrVI), a deep generative model designed to realize the potential of cohort studies at the single-cell level. MrVI tackles two fundamental, intertwined problems: stratifying samples into groups and evaluating the cellular and molecular differences between groups, without requiring predefined cell states. Leveraging its single-cell perspective, MrVI detects clinically relevant stratifications of cohorts of people with COVID-19 or inflammatory bowel disease that are manifested in only certain cellular subsets, enabling new discoveries that would otherwise be overlooked. MrVI can de novo identify groups of small molecules with similar biochemical properties and evaluate their effects on cellular composition and gene expression in large-scale perturbation studies. MrVI is an open-source tool atscvi-tools.org.

Machine learning Software Statistical methods Transcriptomics biology

Deep generative modeling of sample-level heteroge…

N

Nature Methods · Oct 13, 2025

Multitask benchmarking of single-cell multimodal omics integration methods

Single-cell multimodal omics technologies have empowered the profiling of complex biological systems at a resolution and scale that were previously unattainable. These biotechnologies have propelled the fast-paced innovation and development of data integration methods, leading to a critical need for their systematic categorization, evaluation and benchmarking. Navigating and selecting the most pertinent integration approach poses a considerable challenge, contingent upon the tasks relevant to the study goals and the combination of modalities and batches present in the data at hand. Understanding how well each method performs multiple tasks, including dimension reduction, batch correction, cell type classification and clustering, imputation, feature selection and spatial registration, and at which combinations will help guide this decision. Here we develop a much-needed guideline on choosing the most appropriate method for single-cell multimodal omics data analysis through a systematic categorization and comprehensive benchmarking of current methods. The stage 1 protocol for this Registered Report was accepted in principle on 30 July 2024. The protocol, as accepted by the journal, can be found athttps://springernature.figshare.com/articles/journal_contribution/Multi-task_benchmarking_of_single-cell_multimodal_omics_integration_methods/26789902.

Computational models Data integration Software Transcriptomics biology

Multitask benchmarking of single-cell multimodal …

N

Nature Methods · Oct 08, 2025

Cell tracking with accurate error prediction

Cell tracking is an indispensable tool for studying development by time-lapse imaging. However, existing cell trackers cannot assign confidence to predicted tracks, which prohibits fully automated analysis without manual curation. We present a fundamental advance: an algorithm that combines neural networks with statistical physics to determine cell tracks with error probabilities for each step in the track. From these, we can obtain error probabilities for any tracking feature, from cell cycles to lineage trees, that function likePvalues in data interpretation. Our method, OrganoidTracker 2.0, greatly speeds up tracking analysis by limiting manual curation to rare low-confidence tracking steps. Importantly, it also enables fully automated analysis by retaining only high-confidence track segments, which we demonstrate by analyzing cell cycles and differentiation events at scale for thousands of cells in multiple intestinal organoids. Our approach brings cell dynamics-based organoid screening within reach and enables transparent reporting of cell-tracking results and associated scientific claims.

Confocal microscopy Differentiation Image processing Software Statistical methods biology

Cell tracking with accurate error prediction

N

Nature Methods · Oct 08, 2025

Automated classification of cellular expression in multiplexed imaging data with Nimbus

Multiplexed imaging offers a powerful approach to characterize the spatial topography of tissues in both health and disease. To analyze such data, the specific combination of markers that are present in each cell must be enumerated to enable accurate phenotyping, a process that often relies on unsupervised clustering. We constructed the Pan-Multiplex (Pan-M) dataset containing 197 million distinct annotations of marker expression across 15 different cell types. We used Pan-M to create Nimbus, a deep learning model to predict marker positivity from multiplexed image data. Nimbus is a pretrained model that uses the underlying images to classify marker expression of individual cells as positive or negative across distinct cell types, from different tissues, acquired using different microscope platforms, without requiring any retraining. We demonstrate that Nimbus predictions capture the underlying staining patterns of the full diversity of markers present in Pan-M, and that Nimbus matches or exceeds the accuracy of previous approaches that must be retrained on each dataset. We then show how Nimbus predictions can be integrated with downstream clustering algorithms to robustly identify cell subtypes in image data. We have open-sourced Nimbus and Pan-M to enable community use athttps://github.com/angelolab/Nimbus-Inference.

Image processing Machine learning Software biology

Automated classification of cellular expression i…

N

Nature Methods · Oct 01, 2025

Giotto Suite: a multiscale and technology-agnostic spatial multiomics analysis ecosystem

Emerging spatial multiomics technologies provide an increasingly large amount of information content at multiple scales. However, it remains challenging to efficiently represent and harmonize diverse spatial datasets. Here we present Giotto Suite, a suite of modular packages that provides scalable and extensible end-to-end solutions for multiscale and multiomic data analysis, integration and visualization. At its core, Giotto Suite is centered around an innovative data framework, allowing the representation and integration of spatial omics data in a technology-agnostic manner. Giotto Suite integrates molecular, morphology, spatial and annotated feature information to create a responsive and flexible workflow, as demonstrated by applications to several state-of-the-art spatial technologies. Furthermore, Giotto Suite builds upon interoperable interfaces and data structures that bridge the established fields of genomics and spatial data science in R, thereby enabling independent developers to create custom-engineered pipelines. As such, Giotto Suite creates an immersive and multiscale ecosystem for spatial multiomic data analysis.

Computational platforms and environments Software Transcriptomics Machine Learning Genomics Single-cell

Giotto Suite: a multiscale and technology-agnosti…

N

Nature Methods · Sep 29, 2025

InterPLM: discovering interpretable features in protein language models via sparse autoencoders

Despite their success in protein modeling and design, the internal mechanisms of protein language models (PLMs) are poorly understood. Here we present a systematic framework to extract and analyze interpretable features from PLMs using sparse autoencoders. Training sparse autoencoders on ESM-2 embeddings, we identify thousands of interpretable features highlighting biological concepts including binding sites, structural motifs and functional domains. Individual neurons show considerably less conceptual alignment, suggesting PLMs store concepts in superposition. This superposition persists across model scales and larger PLMs capture more interpretable concepts. Beyond known annotations, ESM-2 learns coherent patterns across evolutionarily distinct protein families. To systematically analyze these numerous features, we developed an automated interpretation approach using large language models for feature description and validation. As practical applications, these features can accurately identify missing database annotations and enable targeted steering of sequence generation. Our results show PLM representations can be decomposed into interpretable components, demonstrating the feasibility and utility of mechanistically interpreting these models.

Protein analysis Software Machine Learning Proteomics Genomics

InterPLM: discovering interpretable features in p…

N

Nature Methods · Sep 25, 2025

Merging conformational landscapes in a single consensus space with FlexConsensus algorithm

Structural heterogeneity analysis in cryogenic electron microscopy is experiencing a breakthrough in estimating more accurate, richer and interpretable conformational landscapes derived from experimental data. The emergence of new methods designed to tackle the heterogeneity challenge reflects this new paradigm, enabling users to gain a better understanding of protein dynamics. However, the question of how intrinsically different heterogeneity algorithms compare remains unsolved, which is crucial for determining the reliability, stability and correctness of the estimated conformational landscapes. Here, to overcome the previous challenge, we introduce FlexConsenus: a multi-autoencoder neural network able to learn the commonalities and differences among several conformational landscapes, enabling them to be placed in a shared consensus space with enhanced reliability. The consensus space enables the measurement of reproducibility in heterogeneity estimations, allowing users to either focus their analysis on particles with a stable estimation of their structural variability or concentrate on specific particle subsets detected by only certain methods.

Image processing Machine learning Software Structural Biology Cryo-EM Machine Learning

Merging conformational landscapes in a single con…

N

Nature Methods · Sep 25, 2025

EpiAgent: foundation model for single-cell epigenomics

Although single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) enables the exploration of the epigenomic landscape that governs transcription at the cellular level, the complicated characteristics of the sequencing data and the broad scope of downstream tasks mean that a sophisticated and versatile computational method is urgently needed. Here we introduce EpiAgent, a foundation model pretrained on our manually curated large-scale Human-scATAC-Corpus. EpiAgent encodes chromatin accessibility patterns of cells as concise ‘cell sentences’ and captures cellular heterogeneity behind regulatory networks via bidirectional attention. Comprehensive benchmarks show that EpiAgent excels in typical downstream tasks, including unsupervised feature extraction, supervised cell type annotation and data imputation. By incorporating external embeddings, EpiAgent enables effective cellular response prediction for both out-of-sample stimulated and unseen genetic perturbations, reference data integration and query data mapping. Through in silico knockout ofcis-regulatory elements, EpiAgent demonstrates the potential to model cell state changes. EpiAgent is further extended to directly annotate cell types in a zero-shot manner.

Computational models Data integration Machine learning Software Single-cell Machine Learning Genomics Human

EpiAgent: foundation model for single-cell epigen…

N

Nature Methods · Sep 18, 2025

GPU-accelerated homology search with MMseqs2

Rapidly growing protein databases demand faster sensitive search tools. Here the graphics processing unit (GPU)-accelerated MMseqs2 delivers 6× faster single-protein searches than CPU methods on 2 × 64 cores, speeds previously requiring large protein batches. For larger query batches, it is the most cost-effective solution, outperforming the fastest alternative method by 2.4-fold with eight GPUs. It accelerates protein structure prediction with ColabFold 31.8× over the standard AlphaFold2 pipeline and protein structure search with Foldseek by 4–27×. MMseqs2-GPU is available under an open-source license athttps://mmseqs.com/.

Hardware and infrastructure Protein analysis Protein function predictions Protein structure predictions Software Machine Learning Proteomics Genomics

GPU-accelerated homology search with MMseqs2

N

Nature Methods · Sep 15, 2025

Cancer subclone detection based on DNA copy number in single-cell and spatial omic sequencing data

Somatic mutations such as copy number alterations accumulate during cancer progression, driving intratumor heterogeneity that impacts therapy effectiveness. Understanding the characteristics and spatial distribution of genetically distinct subclones is essential for unraveling tumor evolution and improving cancer treatment. Here we present Clonalscope, a subclone detection method using copy number profiles, applicable to spatial transcriptomics and single-cell sequencing data. Clonalscope implements a nested Chinese Restaurant Process to identify de novo tumor subclones, which can incorporate prior information from matched bulk DNA sequencing data for improved subclone detection and malignant cell labeling. On single-cell RNA sequencing and single-cell assay for transposase-accessible chromatin using sequencing data from gastrointestinal tumors, Clonalscope successfully labeled malignant cells and identified genetically different subclones with thorough validations. On spatial transcriptomics data from various primary and metastasized tumors, Clonalscope labeled malignant spots, traced subclones and identified spatially segregated subclones with distinct differentiation levels and expression of genes associated with drug resistance and survival.

Cancer genomics Genomics Software Statistical methods Tumour heterogeneity Cancer Single-cell Genomics Machine Learning Drug Development

Cancer subclone detection based on DNA copy numbe…

N

Nature Methods · Sep 15, 2025

De novo discovery of conserved gene clusters in microbial genomes with Spacedust

Metagenomics has revolutionized environmental and human-associated microbiome studies. However, the limited fraction of proteins with known biological processes and molecular functions presents a major bottleneck. In prokaryotes and viruses, evolution favors keeping genes participating in the same biological processes colocalized as conserved gene clusters. Conversely, conservation of gene neighborhood indicates functional association. Here we present Spacedust, a tool for systematic, de novo discovery of conserved gene clusters. To find homologous protein matches, Spacedust uses fast and sensitive structure comparison with Foldseek. Partially conserved clusters are detected using novel clustering and order conservationPvalues. We demonstrate Spacedust’s sensitivity with an all-versus-all analysis of 1,308 bacterial genomes, identifying 72,843 conserved gene clusters containing 58% of the 4.2 million genes. It recovered 95% of antiviral defense system clusters annotated by the specialized tool PADLOC. Spacedust’s high sensitivity and speed will facilitate the annotation of large numbers of sequenced bacterial, archaeal and viral genomes.

Genome informatics Metagenomics Software Microbiology Genomics Machine Learning

De novo discovery of conserved gene clusters in m…

N

Nature Methods · Sep 08, 2025

Scvi-hub: an actionable repository for model-driven single-cell analysis

The growing availability of single-cell omics datasets presents new opportunities for reuse, while challenges in data transfer, normalization and integration remain a barrier. Here we present scvi-hub: a platform for efficiently sharing and accessing single-cell omics datasets using pretrained probabilistic models. It enables immediate execution of fundamental tasks like visualization, imputation, annotation and deconvolution on new query datasets using state-of-the-art methods, with massively reduced storage and compute requirements. We show that pretrained models support efficient analysis of large references, including the CZI CELLxGENE Discover Census. Scvi-hub is built within the scvi-tools open-source environment and integrated into scverse. Scvi-hub offers a scalable and user-friendly framework for accessing and contributing to a growing ecosystem of ready-to-use models and datasets, thus putting the power of atlas-level analysis at the fingertips of a broad community of users.

Machine learning Software Statistical methods Transcriptomics Single-cell Machine Learning Genomics Human

Scvi-hub: an actionable repository for model-driv…

Latest Articles

TIRTL-seq: deep, quantitative and affordable paired TCR repertoire sequencing

ImmunoMatch learns and predicts cognate pairing of heavy and light immunoglobulin chains

Bin Chicken: targeted metagenomic coassembly for the efficient recovery of novel genomes

Universal consensus 3D segmentation of cells from 2D segmented stacks

Monod: model-based discovery and integration through fitting stochastic transcriptional dynamics to single-cell sequencing data

STORIES: learning cell fate landscapes from spatial transcriptomics using optimal transport

Nicheformer: a foundation model for single-cell and spatial omics

Annotating the genome at single-nucleotide resolution with DNA foundation models

Improved reconstruction of single-cell developmental potential with CytoTRACE 2

scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution

CELLECT: contrastive embedding learning for large-scale efficient cell tracking

gReLU: a comprehensive framework for DNA sequence modeling and design

Deep generative modeling of sample-level heterogeneity in single-cell genomics

Multitask benchmarking of single-cell multimodal omics integration methods

Cell tracking with accurate error prediction

Automated classification of cellular expression in multiplexed imaging data with Nimbus

Giotto Suite: a multiscale and technology-agnostic spatial multiomics analysis ecosystem

InterPLM: discovering interpretable features in protein language models via sparse autoencoders

Merging conformational landscapes in a single consensus space with FlexConsensus algorithm

EpiAgent: foundation model for single-cell epigenomics

GPU-accelerated homology search with MMseqs2

Cancer subclone detection based on DNA copy number in single-cell and spatial omic sequencing data

De novo discovery of conserved gene clusters in microbial genomes with Spacedust

Scvi-hub: an actionable repository for model-driven single-cell analysis