Core Analyses
Large biobank studies have become an important resource for biomedical research, particularly genetics and epidemiology. Among these, the FinnGen project stands out as one of the largest in the world, capitalising on unique opportunities in Finland. FinnGen consists of genome and longitudinal health data of more than 500 000 Finns, almost 10% of the Finnish population. The aim of the project is to apply cutting-edge genetic analyses to this combined data resource to improve our understanding of disease mechanisms, and thereby provide insights that facilitate the development of better ways of treating and preventing diseases.
While other biobank studies may make data resources available, FinnGen was among the first to explicitly make detailed analysis results available as part of its routine deliverables.
While allowing FinnGen Partners to perform their own customised analyses in the Sandbox environment, this model of providing extensive core analyses makes the resource more immediately valuable to a broader community of researchers and clinicians and eliminates much redundant work.
Indeed, the majority of the day-to-day usage of FinnGen is based on prepared results available in the PheWeb browser, which include genome-wide association (GWAS) results from over 2000 disease endpoints (defined in advice by the Finnish clinical community), easy summaries of the full set of results for each variant and gene of interest, colocalization results within the resource and with other biobank and eQTL resources, meta-analysis results with UKBB and fine-mapping data.
The outcome of FinnGen 1 and 2 core analysis activities is ca. 20 000 genetic disease associations (ca. 8 000 unique genetic loci) with disease risk, 7% of which are in protein coding regions. More than 150 diseases are associated with Finnish enriched coding variants and close to 1 000 of the associations are Finnish specific coding variants. A special feature of FinnGen is that it allows the study of pleiotropy, as each variant is analysed against over 4500 disease endpoints to enable discoveries where the same variant impacts multiple, sometimes unrelated, diseases.