Genetic data

Genome variant data from most of the samples has been produced using a customised genotyping chip with about 700 000 markers. In addition, FinnGen includes existing genotypes from approximately 80 000 individuals from previous studies.

The FinnGen ThermoFisher Axiom custom chip array (v2) contains 723,376 probe sets for 664,510 genetic markers. In addition to the core GWAS markers (about 500,000), the chip contains about 116,000 coding variants enriched in Finland, >10,000 specific markers for the HLA/KIR region, almost 15,000 ClinVar variants, about 4,600 pharmacogenomic variants and 57,000 selected markers that were of special interest for the partners. Approximately 121,000 unique samples were genotyped with v1 of the custom chip array and 328,00 with v2 custom array. From v2 of the array non-functional probe sets were removed and new custom probe sets, including mitochondrial, Y chromosomal as well as markers of special interest to partners were added. 

The marker content of the FinnGen ThermoFisher Axiom custom array v2 (current version) can be downloaded here. The content for v1 (657,675 markers) is downloadable  here

FinnGen has leveraged what are termed "legacy genotypes," referring to chip genotypes generated from the biobank samples prior to the initiation of FinnGen. These samples were genotyped over time using various generations of Illumina GWAS arrays, sourced primarily from the National Institute of Health and Welfare biobank.

To enhance the utility of both the custom chip and legacy genotypes, FinnGen performed imputation on all samples using a reference dataset derived from Finnish whole-genome sequences, encompassing approximately 8700 individuals. This process yielded an inferred genome sequence, allowing for the analysis of a comprehensive set of approximately 21 million variants per individual. Due to the distinct population structure of Finland, imputation accuracy is notably higher, extending to variants that are relatively rare, including many disease mutations specific to the Finnish heritage, in contrast to imputation in more genetically diverse populations.

Additionally, the FinnGen study cohort includes exome sequencing variants from approximately 25,000 and genome sequencing variants from around 6,000 FinnGen study subjects' samples. These datasets further enrich the genetic resources available for the FinnGen study and have been primarily acquired through the THL biobank.