N Nature Biotechnology · Nov 11, 2025 Multimodal learning enables chat-based exploration of single-cell data Single-cell sequencing characterizes biological samples at unprecedented scale and detail, but data interpretation remains challenging. Here, we present CellWhisperer, an artificial intelligence (AI) model and software tool for chat-based interrogation of gene expression. We establish a multimodal embedding of transcriptomes and their textual annotations, using contrastive learning on 1 million RNA sequencing profiles with AI-curated descriptions. This embedding informs a large language model that answers user-provided questions about cells and genes in natural-language chats. We benchmark CellWhisperer’s performance for zero-shot prediction of cell types and other biological annotations and demonstrate its use for biological discovery in a meta-analysis of human embryonic development. We integrate a CellWhisperer chat box with the CELLxGENE browser, allowing users to interactively explore gene expression through a combined graphical and chat interface. In summary, CellWhisperer leverages large community-scale data repositories to connect transcriptomes and text, thereby enabling interactive exploration of single-cell RNA-sequencing data with natural-language chats. Gene regulation in immune cells Machine learning Preclinical research Software Transcriptomics biology
N Nature Biotechnology · Oct 24, 2025 Deep-learning-based virtual screening of antibacterial compounds The increase in multidrug-resistant bacteria underscores an urgent need for additional antibiotics. Here, we integrate small-molecule high-throughput screening with a deep-learning-based virtual screening approach to uncover new antibacterial compounds. We screen ~2 million small molecules against a sensitizedEscherichia colistrain, yielding thousands of hits. We use these data to train a deep learning model, GNEprop, to predict antibacterial activity, retrospectively validating robustness with respect to out-of-distribution generalization and activity cliff prediction. Virtual screening of over 1.4 billion synthetically accessible compounds identifies potential candidates, of which 82 exhibit antibacterial activity on the same strain, illustrating a 90-fold improved hit rate over the high-throughput screening experiment used for training. Many newly identified compounds exhibit high dissimilarity to known antibiotics, potency beyond the training bacterial strain and selectivity. Biological characterization identifies specific, validated targets, indicating promising avenues for further exploration in antibiotic discovery. High-throughput screening Machine learning Virtual drug screening Virtual screening biology
N Nature Biotechnology · Oct 16, 2025 Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies offer potential advantages in repeat mapping and variant phasing. We present DeepSomatic, a deep-learning method for detecting somatic small nucleotide variations and insertions and deletions from both short-read and long-read data. The method has modes for whole-genome and whole-exome sequencing and can run on tumor–normal, tumor-only and formalin-fixed paraffin-embedded samples. To train DeepSomatic and help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available the Cancer Standards Long-read Evaluation (CASTLE) dataset of six matched tumor–normal cell line pairs whole-genome sequenced with Illumina, PacBio HiFi and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples, both cell line and patient-derived, and across short-read and long-read sequencing technologies, DeepSomatic consistently outperforms existing callers. Genome informatics Machine learning biology