Results based on the full FinnGen cohort of 500,000 participants released
In order to make the biggest possible contribution to global biomedical research, complete results and summary statistics of the FinnGen GWAS analyses are made available publicly to the entire scientific community following a one-year embargo. Since 2020, when the first results were released, FinnGen has received more than 27,000 download requests.
"We are committed to making the results of this unprecedented study cohort available to researchers worldwide, fostering studies that will propel new genetic discoveries and approaches to human health. The strong response to our open-access resources underscores the value of this data for the scientific community, and we’re eager to see the insights that emerge," said Aarno Palotie, Scientific Director of the FinnGen study.
As for the earlier releases, the results can be browsed with the web browser (PheWeb) and summary statistics data can be downloaded from a Google cloud storage free of charge. Please visit our Access results page for the instructions.
"This achievement would not have been possible without the remarkable dedication and expertise of our core team, the support of Finnish biobanks and clinical experts, and, most importantly, the contribution of our study participants," Palotie said. “Their collective efforts have been invaluable in establishing FinnGen as a leading resource for global genetic research."
This release (R12) consists of:
• A total of 500,348 individuals, including 282,064 females and 218,284 males
• 2,502 health endpoints
• > 21 M variants
Basic characteristics of the final cohort
The median age of the participants when donating the biobank sample was 53 years. About 218,000 samples (44%) are from men and 282,000 (56%) from women. The cohort is composed of samples from all Finnish biobanks. The cohort includes ~315,000 patients from health care and specialist health care, ~58,000 healthy blood donors, ~2,000 hematological cancer patients, and ~180,000 participants from population-based or disease-based national studies.
Health information
Most of the phenotype data in FinnGen comes from national health registers covering the entire lifespan of the study subjects. Combining data from different registries, such as ICD-10 codes, drug prescription data and causes of death, provides opportunities to construct reliable disease endpoints as well as novel long-term phenotypes of disease progression and therapeutic response.
During FinnGen, a significant effort has been put into creating meaningful clinical endpoints based on the digital health record data. These endpoints are publicly available and the research community can learn more about the best practices related to forming specific disease endpoints through the Risteys tool or by visiting the Clinical endpoints page. FinnGen is also working on the conversion of Finnish ICD codes to the standard OMOP-CDM data model, in order to maximise interoperability with other global biomedical research efforts.
Genotype information
Genome variant data from most of the samples has been produced using a customised genotyping chip with about 700,000 markers. In addition, FinnGen includes existing genotypes from approximately 80,000 individuals from previous studies.
To enhance the utility of both the custom chip and legacy genotypes, FinnGen performed imputation on all samples using a reference dataset derived from Finnish whole-genome sequences, encompassing approximately 8,700 individuals. This process yielded an inferred genome sequence, allowing for the analysis of a comprehensive set of approximately 21 million variants per individual. Due to the distinct population structure of Finland, imputation accuracy is notably higher, extending to variants that are relatively rare, including many disease mutations that are specific to the Finnish heritage, in contrast to imputation in more genetically diverse populations.