Big data accumulated from biomedical and agronomic studies provides the potential to identify genes controlling complex human diseases and agriculturally important traits through genome-wide association studies (GWAS). However, big data also poses tough computational challenges, especially when complex statistical models are employed to simultaneously reduce false positives and false negatives. The newly developed Fixed and random model Circulating Probability Unification (FarmCPU) method uses a bin method under the assumption that Quantitative Traits Nucleotides (QTNs) are evenly distributed throughout the genome. The estimated QTNs are used to separate a mixed linear model into a computationally efficient fixed effect model (FEM) and a computationally expensive random effect model (REM), which are then used iteratively. To completely eliminate the computationally expensive REM, we replaced REM with FEM by using Bayesian information. To eliminate the requirement that QTNs be evenly distributed throughout the genome, we replaced the bin method with linkage disequilibrium information. The new method is called Bayesian and Linkage disequilibrium information Iteratively Nested Knitting (BLINK).
Both real and simulated data analyses demonstrated that BLINK improves statistical power compared to existing methods and also achieves an extremely high computing efficiency. An extremely large dataset (e.g., one million individuals and one million markers), which would take weeks to analyze with FarmCPU, can now be analyzed within hours with BLINK.
Available to demo and license at our on-line store: BLINK