A single, large-scale study is needed to reliably quantify the effects of common genetic variation on cardiovascular disease and other complex traits. Such studies are logistically complex and were not possible until recently. The Metabochip[@b1] is a cost-effective and user-friendly array of 196,228 SNPs that have been replicated in GWAS for cardiovascular and metabolic traits. It includes approximately two-thirds of the SNPs associated with these traits with a minor allele frequency (MAF) ≥ 0.05. The Metabochip was validated in several multi-ethnic populations[@b2][@b3][@b4][@b5] and has been applied to large studies of diabetes and other cardiovascular risk factors in participants from Europe and the United States[@b6][@b7]. For all of these studies, one or more phenotypes such as weight, height, or lipid levels have been collected. As a consequence, the effects of these SNPs on the trait(s) of interest can be directly evaluated and we can gain insight into the underlying biology by analyzing the association results. This is especially valuable in the analysis of a quantitative trait because the trait heritability can be partitioned into its genetic and environmental components (heritability = genetic variance/(genetic variance + environmental variance) in a pedigree). Because the goal of large-scale GWAS is to discover genetic factors influencing disease risk, it is important to understand their functional consequences to provide insight into the biology of the disease. The use of gene-based analyses, which test the association of one or more genes with disease risk or a continuous trait (see review[@b8]), provides a powerful approach to this goal. The interpretation of gene-based tests is not straightforward, however. Genetic risk that is localized in one region (i.e. a haplotype block) can be partitioned into that residing within a gene and that outside of the gene. Although there may be only a few variants in a block with strong association, all possible partitionings of the variants are tested, and the number of SNPs is large. In addition, multiple tests for association must be considered: for each gene a separate test is performed. As a consequence, multiple testing adjustments are required for gene-based tests (see review[@b9]). It is important to emphasize that the number of independent tests for a GWAS depends on the number of samples and number of SNPs in the genome, which increase at an exponential rate with sample size. Therefore, the use of a conservative threshold will result in decreased power to detect association. However, this conservative threshold should also reduce the likelihood that spurious associations are reported. Previous gene-based tests for cardiovascular traits have focused on individual genes. These tests have been used to evaluate gene-by-smoking interaction effects on serum triglyceride (TG) levels in participants from 21 studies in the Collaborative Study of the Genetics of Alcoholism (COGA), which comprises 15,938 unrelated individuals[@b10]. A weighted gene-based test of associations with triglycerides was performed in COGA, although the weighting procedure requires careful selection. Cerebral and peripheral atherosclerosis were tested in a case-control study of 3,748 participants, where single SNPs in each gene were tested individually and each test was adjusted for 22 principal components[@b11]. A set of 12 genes associated with LDL-cholesterol (LDL-C) levels were identified in a large sample of the Quebec family study[@b12]. A weighted gene-based test based on SNPs from 26 genes was used to identify genes associated with mean corpuscular volume (MCV) levels, but no gene-based test statistics were presented for a second trait (mean corpuscular hemoglobin concentration (MCHC))[@b13]. The gene-based tests presented in these studies are all based on the asymptotic null distribution of SNP effects. As discussed by Skol *et al.*[@b14], the performance of statistical tests is substantially affected by gene size, the presence of small samples and deviations from the asymptotic null distribution, in which all of the SNPs are independent. Large genes will have fewer independent tests because many of the possible haplotype configurations are highly correlated. When there are very few independent tests, the null distribution will be largely determined by a few significant tests. The presence of closely-linked SNPs, each of which is individually associated with a particular trait (known as genetic linkage) leads to a small number of significant tests[@b14][@b15]. When using gene-based tests, the genetic linkage leads to a very small number of independent tests. These small sample size and linkage effects should be addressed to protect against spurious associations in GWAS and gene-based tests. Our goal is to expand upon the idea of using genetic linkage to increase statistical power in GWAS in the presence of small sample size and apply it to gene-based tests of association. The methodology outlined in this paper can be applied to all traits and studies, regardless of the number of SNPs that has been genotyped. Results ======= Single gene analyses of height ------------------------------ [Figure 1](#f1){ref-type="fig"} presents the results from the gene-based tests for height. The most significant association, identified by an all-gene (global) test for association, was with the GHRL gene. [Figure 2](#f2){ref-type="fig"} presents a Q-Q plot of all-gene test statistics and the null distribution under the null hypothesis of no association. Because there is significant linkage disequilibrium (LD) between SNPs in the GHRL gene, there is little evidence of association in the actual test statistics. We found the number of independent tests in the GHRL gene is only one, and therefore the true significance of the gene-based tests are inflated. The inflation factor is 1.06, which suggests that using a conservative threshold (e.g., Bonferroni adjusted p-value \<1.1 × 10^−5^) would provide reasonable protection against type I error. There are no indications of strong linkage disequilibrium in the gene regions surrounding GHRL and the other genes shown in [Fig. 1](#f1){ref-type="fig"}, although this is not to suggest there is no association in these regions. Single gene analyses of body mass index --------------------------------------- [Figure 3](#f3){ref-type="fig"} presents results from gene-based tests of BMI. A large number of significant associations were identified in the region between IGF1 and IGFBP3. To address the confounding of genetic linkage in the region, we created a new variable (MIG) to represent the number of independent tests in each gene region using an algorithm that was originally proposed by Wellek and Chaloner[@b16]. [Figure 4](#f4){ref-type="fig"} presents the same gene-based association results as [Fig. 3](#f3){ref-type="fig"} but now stratified by number of independent tests in each gene region (MIG). The gene-based association results are strongly driven by the GHRL gene, which has an extremely high MIG. [Figure 5](#f5){ref-type="fig"} presents the Q-Q plot of the gene-based test statistics for BMI stratified by MIG. The points for the GHRL gene are all outside of the null distribution and are therefore far from a normal distribution. Again, the inflation factor is 1.06, which suggests that using a conservative threshold would provide reasonable protection against type I error. Single gene analyses of lipid traits ------------------------------------ [Figure 6](#f6){ref-type="fig"} presents the results from gene-based tests for plasma lipid traits. The tests for TG were driven by a single region between APOA5 and APOA4. Similarly, the lipid traits HDL-C and total cholesterol (TC) had significant associations in the regions containing LDLR, APOC4, and CETP. There was no obvious pattern of SNP clusters that would explain the signal observed for LDL-C, which has the largest number of significant SNPs, and MCHC, which has the smallest number of significant SNPs. [Figure 7](#f7){ref-type="fig"} presents Q-Q plots of the gene-based test statistics for HDL-C and LDL-C using MIG. The inflation factors for the gene-based tests are 1.07 (HDL-C) and 1.07 (LDL-C), suggesting that using a conservative threshold would provide reasonable protection against type I error. Inference of epistasis ---------------------- We performed the two-locus interaction analysis of APOA5, apolipoprotein C4 (APOC4), adiponectin, cholesteryl ester transfer protein (CETP), and glycosaminoglycan (GAG) as well as the three-locus interaction analysis of APOA5, APOC4, and CETP. We evaluated several one-SNP based tests for interaction, including logistic regression and likelihood ratio test (LRT) but chose the generalized linear model test (GLM), which adjusts for environmental factors, to be used in the SNP-by-SNP analysis because it has greater power than logistic regression and LRT for rare variants[@b17]. Although interaction effects can be identified using all of these tests, including those that explicitly model genetic markers and their interactions, our interest is in validating interactions that have been previously reported to interact. Our approach is to test for interaction by conditioning on SNP and environmental main effects. This approach assumes that any interactions that exist act via the main effects of SNPs and confounders. The results from this analysis are presented in [Fig. 8](#