N Nature Genetics · Nov 27, 2025 A genome-wide association study of mass spectrometry proteomics using a nanoparticle enrichment platform Most studies to date of protein quantitative trait loci (pQTLs) have relied on affinity proteomics platforms, which provide only limited information about the targeted protein isoforms and may be affected by genetic variation in their epitope binding. Here we show that mass spectrometry (MS)-based proteomics can complement these studies and provide insights into the role of specific protein isoform and epitope-altering variants. Using the Seer Proteograph nanoparticle enrichment MS platform, we identified and replicated new pQTLs in a genome-wide association study of proteins in blood plasma samples from two cohorts and evaluated previously reported pQTLs from affinity proteomics platforms. We found that >30% of the evaluated pQTLs were confirmed by MS proteomics to be consistent with the hypothesis that genetic variants induce changes in protein abundance, whereas another 30% could not be replicated and are possibly due to epitope effects, although alternative explanations for nonreplication need to be considered on a case-by-case basis. Genetics research Genome-wide association studies Proteomics biology
N Nature Genetics · Nov 20, 2025 Scalable and accurate rare variant meta-analysis with Meta-SAIGE Meta-analysis enhances the power of rare variant association tests by combining summary statistics across several cohorts. However, existing methods often fail to control type I error for low-prevalence binary traits and are computationally intensive. Here we introduce Meta-SAIGE—a scalable method for rare variant meta-analysis that accurately estimates the null distribution to control type I error and reuses the linkage disequilibrium matrix across phenotypes to boost computational efficiency in phenome-wide analyses. Simulations using UK Biobank whole-exome sequencing data show that Meta-SAIGE effectively controls type I error and achieves power comparable to pooled individual-level analysis with SAIGE-GENE+. Applying Meta-SAIGE to 83 low-prevalence phenotypes in UK Biobank and All of Us whole-exome sequencing data identified 237 gene–trait associations. Notably, 80 of these associations were not significant in either dataset alone, underscoring the power of our meta-analysis. Bioinformatics Genetics research Genome-wide association studies Software Genetics Genomics Human Clinical
N Nature Genetics · Nov 17, 2025 African-ancestry-specific variant IKKβ p.Glu502Lys confers high lupus risk Cutaneous lupus erythematosus (CLE) is an autoimmune disease of the skin, occurring with or without systemic lupus erythematosus (SLE). People with African ancestry have a higher risk than people with other ancestries of developing lupus1but have been underrepresented in genetic studies. We whole-genome-sequenced 27,820 Americans with genetically inferred African ancestry from the Diverse Ancestry Cohort, including people with CLE (n= 211) and/or SLE (n= 574). We discovered an association with a rare missense variant inIKBKB, rs115698972G>A, IKKβE502K, exclusive to people with African ancestry, conferring an odds ratio (OR) of 5.4 for CLE and 3.3 for SLE. These associations replicated in the All of Us and VA Million Veteran Research Programs for CLE (ORmeta= 3.8,Pmeta= 5.3 × 10−20,n= 1,243) and SLE (ORmeta= 3.2,Pmeta= 1.0 × 10−19,n= 1,697). In this cohort, IKKβE502Kaccounts for 10.4% of CLE cases and 6.4% of SLE cases, confers a high lupus risk, and contributes substantially to the disease prevalence among people with African ancestry. This highlights the value of including diverse ancestries in genetic association studies. Genome-wide association studies Systemic lupus erythematosus biology
N Nature Genetics · Nov 14, 2025 Genome-wide association study and polygenic risk prediction of hypothyroidism We performed a genome-wide meta-analysis of hypothyroidism (113,393 cases and 1,065,268 controls), free thyroxine (191,449 individuals) and thyroid-stimulating hormone (482,873 individuals). We identified 350 loci associated with hypothyroidism, including 179 not previously reported, 29 of which were linked through thyroid-stimulating hormone. We found that many hypothyroidism risk loci regulate blood cell counts and the circulating inflammasome, and through multiple gene-mapping strategies, we prioritized 259 putative causal genes enriched in immune-related functions. We developed a polygenic risk score (PRS) based on more than 115,000 hypothyroidism cases to address diagnostic challenges in individuals with or at risk of thyroid hormone deficiency. We show that the highest predictive accuracy for hypothyroidism was achieved when combining the PRS with thyroid hormones and thyroid-peroxidase autoantibodies, and that the PRS was able to stratify risk of progression among individuals with subclinical hypothyroidism. These findings demonstrate the potential for a hypothyroidism PRS to support the prediction of disease progression and onset in thyroid hormone deficiency. Genome-wide association studies Thyroid diseases biology
N Nature Genetics · Nov 12, 2025 Computationally efficient meta-analysis of gene-based tests using summary statistics in large-scale genetic studies Meta-analysis of gene-based tests using single-variant summary statistics is a powerful strategy for genetic association studies. However, current approaches require sharing the covariance matrix between variants for each study and trait of interest. For large-scale studies with many phenotypes, these matrices can be cumbersome to calculate, store and share. Here, to address this challenge, we present REMETA—an efficient tool for meta-analysis of gene-based tests. REMETA uses a single sparse covariance reference file per study that is rescaled for each phenotype using single-variant summary statistics. We develop new methods for binary traits with case–control imbalance, and to estimate allele frequencies, genotype counts and effect sizes of burden tests. We demonstrate the performance and advantages of our approach through meta-analysis of five traits in 469,376 samples in UK Biobank. The open-source REMETA software will facilitate meta-analysis across large-scale exome sequencing studies from diverse studies that cannot easily be combined. Genome-wide association studies Software Genetics Genomics Human Meta-analysis
N Nature Genetics · Nov 07, 2025 Genetic basis of flavor complexity in sweet corn Sweet corn is an important vegetable crop consumed globally. However, the genetic differentiation between field corn and sweet corn, and the impact of breeding on the metabolite composition and flavor (other than sweetness) of sweet corn, remain poorly understood. Here we assembled a cultivated sweet-corn genome de novo and re-sequenced 295 diverse sweet-corn inbred lines. We examined the genetic architecture of sweet-corn kernel quality by combining genetic, metabolite and expression profiling methodologies. New genes (for example,ZmAPS1,ZmSK1andZmCRR5) and metabolites associated with flavor and consumer preference were identified, highlighting important target flavor metabolites, including sugars, acids and volatiles. These findings provide valuable knowledge and targets for future genetic breeding of sweet-corn flavor, and to balance grain yield and quality and contribute to our broader understanding of crop diversification. Genome-wide association studies Plant genetics Population genetics biology
N Nature Genetics · Nov 04, 2025 Multi-ancestry genome-wide association analyses of polycystic ovary syndrome Polycystic ovary syndrome (PCOS), the leading endocrine disorder in women of reproductive age, is highly heritable, yet its polygenic architecture remains poorly understood. Here we conducted a genome-wide association study on 12,419 Chinese women with PCOS and 34,235 controls, followed by a multi-ancestry meta-analysis with up to 13,773 European cases and 411,088 controls, identifying 94 independent loci, 73 of which were previously unreported. Despite different evolutionary pressures, Chinese and European ancestries showed substantial genetic overlap. Integrative functional analyses prioritized regulatory variants controlling gene activity in specific tissues, disease-causing genes including anti-Müllerian hormone (AMH), and biological pathways involving ligand-binding domain interactions and peroxisome proliferator-activated receptor gamma (PPARG) signaling. We identified granulosa cells as particularly important in PCOS development. Our genetics-driven drug discovery approach revealed multiple drug targets and repurposing opportunities, enabling personalized treatment strategies. These results enhance our understanding of the molecular basis of PCOS, paving the way for precision medicine. Genome-wide association studies Polycystic ovary syndrome biology
N Nature Genetics · Nov 04, 2025 Genetic associations with educational fields Educational field choices shape careers, wellbeing and the societal skill distribution, yet genetic influences on what people study remain poorly understood. Here we show that genetic factors are associated with educational field specializations using genome-wide association studies (GWASs) across 463,134 individuals from Finland, Norway and the Netherlands (effectivenbetween 40,072 and 317,209). We identified 17 independent genome-wide significant variants linked to 7 of 10 educational fields, with average heritability of 7%. The genetic signal is specific to field choice rather than educational level, persisting after controlling for years of schooling and confounding factors. By examining genetic clustering across specializations, we uncovered two key dimensions: technical versus social and practical versus abstract. We performed GWASs of these components and demonstrated distinct genetic correlations with personality, behavior and socioeconomic status. Our findings demonstrate that genomic research can illuminate ‘horizontal’ stratification, revealing insights into vocational interests and social sorting beyond traditional attainment measures. Behavioural genetics Genetics research Genome-wide association studies Psychiatric disorders Genetics GWAS Human Social Sciences
N Nature Genetics · Nov 03, 2025 Integrated metabolomic and transcriptomic analyses identifyMYBgenes regulating key metabolites and agronomic traits in upland cottonGossypium hirsutum Understanding early embryonic development is fundamental for unraveling plant cell differentiation and organogenesis. Here we integrate multiomics data from 403 upland cotton ovules to identify 2,960 metabolic quantitative trait loci and 24,485 expression quantitative trait loci. A key locus,ME_A07, influencing 252 known metabolite levels and expression of 4,293 genes, with theMYBgeneGhTT2_A07identified as central regulator, potentially regulated by a 520 kb inversion.GhTT2_A07orchestrated both primary and secondary metabolite biosynthesis, influencing agronomic traits. Another locus,ME_A06, driven by theMYBgeneProanthocyanidin Regulator(GhPAR), modulates proanthocyanin content and suggests an ecological adaptation.GhTT2_A07andGhPARexhibit both shared and distinct expression profiles, contributing variably to fiber quality and yield. These findings highlight the critical role ofMYBgenes in the early development of cotton ovules and fibers, offering comprehensive multiomics resources that advance cotton research and molecular breeding. Gene regulation Genome-wide association studies Metabolomics biology
N Nature Genetics · Nov 03, 2025 Liability threshold model-based disease risk prediction based on electronic health record phenotypes Electronic health records have been increasingly adopted as useful resources for genomic research. However, case–control labeling of clinical data from electronic health records is challenging and most studies utilize phenotype codes to define case/control labels, resulting in suboptimal downstream analyses. Here we describe the liability threshold phenotypic integration, a method combining genetic relatedness with phenotypic data, including binary and continuous traits such as diagnosis codes, family disease history, laboratory measurements and biomarkers, to derive new continuous phenotypes for target diseases. The model utilizes an automatic trait selection algorithm that increases performance in disease risk prediction and provides insights into nontarget traits associated with the target disease. Our simulations and applications to the eMERGE network and the UK Biobank data demonstrate consistent performance gains in disease risk prediction and genome-wide association study power compared to conventional phenotype codes, models that solely incorporate family history and the phenotype imputation method SoftImpute, with similar false-positive rate control. Genetics research Genome-wide association studies Genetics Machine Learning Human Clinical
N Nature Genetics · Oct 31, 2025 An African ancestry-specific nonsense variant inCD36is associated with a higher risk of dilated cardiomyopathy The high burden of dilated cardiomyopathy (DCM) in individuals of African descent remains incompletely explained. Here, to explore a genetic basis, we conducted a genome-wide association study in 1,802 DCM cases and 93,804 controls of African genetic ancestry (AFR). A nonsense variant (rs3211938:G) inCD36was associated with increased risk of DCM. This variant, believed to be under positive selection due to a protective role in malaria resistance, is present in 17% of AFR individuals but <0.1% of European genetic ancestry (EUR) individuals. Homozygotes for the risk allele, who comprise ~1% of the AFR population, had approximately threefold higher odds of DCM. Among those without clinical cardiomyopathy, homozygotes exhibited an 8% absolute reduction in left ventricular ejection fraction. In AFR, the DCM population attributable fraction for theCD36variant was 8.1%. This single variant accounted for approximately 20% of the excess DCM risk in individuals of AFR compared to those of EUR. Experiments in human induced pluripotent stem cell-derived cardiomyocytes demonstrated thatCD36loss of function impairs fatty acid uptake and disrupts cardiac metabolism and contractility. These findings implicateCD36loss of function and suboptimal myocardial energetics as a prevalent cause of DCM in individuals of African descent. Cardiomyopathies Genome-wide association studies biology
N Nature Genetics · Oct 13, 2025 Statistical construction of calibrated prediction intervals for polygenic score-based phenotype prediction Accurately quantifying uncertainty in predicted phenotypes from polygenic score (PGS)-based applications is essential for reliable clinical interpretation of PGS, supporting effective disease risk assessment and informed decision-making. Here, we present PredInterval, a nonparametric method for constructing well-calibrated prediction intervals. PredInterval is compatible with any PGS method, takes either individual-level data or summary statistics as input and relies on information from quantiles of phenotypic residuals through cross-validation to achieve well-calibrated coverage of true phenotypic values across diverse genetic architectures. We apply PredInterval to analyze 17 traits in real-data applications, where PredInterval not only represents the sole method achieving well-calibrated prediction coverage across traits, but it also offers a principled approach to identify high-risk individuals using prediction intervals, leading to an average improvement of identification rates by 8.7–830.4% compared with existing approaches. Overall, PredInterval represents a robust and versatile tool for enhancing the clinical utility of PGS. Genome-wide association studies Statistics Genetics Machine Learning Human Clinical
N Nature Genetics · Oct 10, 2025 Population-scale gene-based analysis of whole-genome sequencing provides insights into metabolic health In addition to its coverage of the noncoding genome, whole-genome sequencing (WGS) may better capture the coding genome than exome sequencing. Here we sought to exploit this and identify new rare, protein-coding variants associated with metabolic health in WGS data (n= 708,956) from the UK Biobank and All of Us studies. Identified genes highlight new biological mechanisms, including protein-truncating variants (PTVs) in the DNA double-strand break repair geneRIF1that have a substantial effect on body mass index (2.66 kg m−2, s.e. 0.43,P= 3.7 × 10−10).UBR3is an intriguing example where PTVs independently increase body mass index and type 2 diabetes risk. Furthermore, PTVs inIRS2have a substantial effect on type 2 diabetes (odds ratio 6.4 (3.7–11.3),P= 9.9 × 10−14, 34% case prevalence among carriers) and were also associated with chronic kidney disease independent of diabetes status, suggesting an important role for IRS2 in maintaining renal health. Our study demonstrates that large-scale WGS provides new mechanistic insights into human metabolic phenotypes through improved capture of coding sequences. Genetics research Genome-wide association studies Obesity Type 2 diabetes biology
N Nature Genetics · Oct 03, 2025 A genetic map of human metabolism across the allele frequency spectrum Genetic studies of human metabolism have been limited in scale and allelic breadth. Here we provide a data-driven map of the genetic regulation of circulating small molecules and lipoprotein characteristics (249 traits) measured using proton nuclear magnetic resonance spectroscopy across the allele frequency spectrum in ~450,000 individuals. Trans-ancestral meta-analyses identify 29,824 locus–metabolite associations mapping to 753 regions with effects largely consistent between men and women and large ancestral groups represented in UK Biobank. We observe and classify extreme genetic pleiotropy, identify regulators of lipid metabolism, and assign effector genes at >100 loci through rare-to-common allelic series. We propose roles for genes less established in metabolic control (for example,SIDT2), genes characterized by phenotypic heterogeneity (for example,APOA1) and genes with specific disease relevance (for example,VEGFA). Our study demonstrates the value of broad, large-scale metabolomic phenotyping to identify and characterize regulators of human metabolism. Epidemiology Genome-wide association studies Genetics Metabolism Human Genomics Machine Learning
N Nature Genetics · Sep 18, 2025 Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, people from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UK Biobank than previous efforts, to produce freely available summary statistics for 7,266 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci (P< 5 × 10−8) in the meta-analysis that were not found in the EUR genetic ancestry group alone, including new associations, for example betweenCAMK2Dand triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant inG6PDassociated with several biomarker traits. We release these results publicly alongside frequently asked questions that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations. Genome-wide association studies Population genetics Genetics Genomics Human Machine Learning
N Nature Genetics · Sep 17, 2025 Genome-wide association meta-analysis of childhood ADHD symptoms and diagnosis identifies new loci and potential effector genes We performed a genome-wide association meta-analysis (GWAMA) of 290,134 attention-deficit/hyperactivity disorder (ADHD) symptom measures of 70,953 unique individuals from multiple raters, ages and instruments (ADHDSYMP). Next, we meta-analyzed the results with a study of ADHD diagnosis (ADHDOVERALL). ADHDSYMPreturned no genome-wide significant variants. We show that the combined ADHDOVERALLGWAMA identified 39 independent loci, of which 17 were new. Using a recently developed gene-mapping method, Fine-mapped Locus Assessment Model of Effector genes, we identified 22 potential ADHD effector genes implicating several new biological processes and pathways. Moderate negative genetic correlations (rg< −0.40) were observed with multiple cognitive traits. In three cohorts, polygenic scores (PGSs) based on ADHDOVERALLoutperformed PGSs based on ADHD symptoms and diagnosis alone. Our findings support the notion that clinical ADHD is at the extreme end of a continuous liability that is indexed by ADHD symptoms. We show that including ADHD symptom counts helps to identify new genes implicated in ADHD. Behavioural genetics Genome-wide association studies Neuroscience Genetics Genomics Human