R packages

A database of over 5000 soybean RNA-seq samples


September 8, 2023


Soybean is a crucial crop worldwide, used as a source of food, feed, and industrial products due to its high protein and oil content. Previously, the rapid accumulation of soybean RNA-seq data in public databases and the computational challenges of processing raw RNA-seq data motivated us to develop the Soybean Expression Atlas, a gene expression database of over a thousand RNA-seq samples. Over the past few years, our database has allowed researchers to explore the expression profiles of important gene families, discover genes associated with agronomic traits, and understand the transcriptional dynamics of cellular processes. Here, we present the Soybean Expression Atlas v2, an updated version of our database with a fourfold increase in the number of samples, featuring transcript- and gene-level transcript abundance matrices for 5481 publicly available RNA-seq samples. New features in our database include the availability of transcript-level abundance estimates and equivalence classes to explore differential transcript usage, abundance estimates in bias-corrected counts to increase the accuracy of differential gene expression analyses, a new web interface with improved data visualization and user experience, and a reproducible and scalable pipeline available as an R package. The Soybean Expression Atlas v2 is available at, and it will accelerate soybean research, empowering researchers with high-quality and easily accessible gene expression data.