FinnGen 1-2
In FinnGen phase 3, which began in August 2023, the resource is deepened with more data to understand the onset, progression and treatment response of the diseases discovered in the first stages of the project.
FinnGen 1 (years 2017-2020)
The FinnGen project began in 2017 with the collection of study samples from Finnish biobanks. Approximately 185 000 “legacy samples” came from previous studies and were already stored in the biobanks, and 335 000 samples were collected during the first six years of FinnGen by the Finnish biobanks. FinnGen genotyped the samples and connected them to longitudinal health data from different national registers including information on birth, death, diseases, hospital visits and purchased drugs.
At the same time a computing infrastructure was set-up for the analyses, visualisation and sharing of the data and results. FinnGen established a secure, audited computing environment for analysing and storing the data and sharing the results. The developed Google Cloud environment fulfils all security and data protection requirements and has two main components, one that contains analysis results in an easily browsable form (no individual level data) and another, the Sandbox, which enables analyses of individual level genetic and registry-based phenotype data.
During the first six years of FinnGen, a new data set of genotype and phenotype data of the sampled individuals was produced and released into the Sandbox for analyses by the FinnGen partners (data freeze) every six months. The first data freeze consisting of approximately 52 000 individuals was released in February 2018.The amount of produced data accumulated with an addition of ~35 000 to 50 000 genotyped individuals every six months.
GWAS and PheWas results for several thousands of disease endpoints (phenotypes defined by clinical experts based on the register data) as well as finemapping of the GWAS hits, colocalization results of the GWAS hits, autoreporting (automatic annotation) of the GWAS results, meta-analyses (with UK Biobank and the Estonian Biobank) and variant annotations were also generated centrally every six months. One year after each data freeze, the GWAS results were released for use by the entire scientific community.
FinnGen 2 (years 2020-2023)
The core activities of FinnGen 1 continued in FinnGen 2 with the release of phenotype and genotype data and the results of core analyses every six months (followed by a public release of the results one year later) and with continued maintenance of the computing environment and the development of analysis tools.
The FinnGen 1 and 2 sample collection period culminated in the release of the final dataset in September 2023, which is when FinnGen hit its target of sampling close to 10% of the Finnish population with a total of 520 000 genotyped and phenotyped individuals.
During FinnGen 2, pilot studies were also conducted to expand the research in six activities. These Expansion Areas were: E1 - targeted recruitment to enhance recruitment in selected areas in the hospital biobanks; EA2 - recall of FinnGen study subjects to answer health a questionnaire and take a cognitive test; EA3 - Phenotype enrichment with clinical data in selected disease areas; EA4 - FinnGen replication study involving the meta-analysis of 500 endpoints; EA5 - pipeline for sample collection for blood analysis and functional characterization of the Finnish enriched alleles with proteomics or snRNASeq + ATACSeq from a subset of samples; EA6 - proof-of-concept study on cognitive decline in prodromal Alzheimer’s. The results of the EA studies informed the design of FinnGen 3.