Saturday, May 22, 2010
Franklin Hall B Level 4 (Philadelphia Marriott Downtown)
9:00 AM
Background: GWAS focus at the SNP-level. This is the most sensible approach if the causal variants are in LD with the same SNP for every individual; but if the causal variant is in LD with the same gene but to different SNPs, the most sensible approach is Gene-centric. Also, the noise introduced by different genetic backgrounds, which hinders the signal, is usually not factored into GWAS analysis. We suggest a GWAS analysis method that obviates these constraints. This method clusters individuals based on genetic similarity, applies an appropriate genetic association test to each cluster, ranks SNPs based on the results of the test, selects genes corresponding to top ranking associated SNPs, and then evaluates the gene overlap between population clusters. This approach will uncover otherwise missed signal if: 1) the causal variant is associated to different SNPs in different populations because of different LD patterns, 2) the causal variant is different in different populations due to a divergent evolution history.
Objectives: To find genes associated with autism, applying a Gene-centric approach to the AGP data and factoring the genetic background of the individuals into the analysis.
Methods: The AGP data set consists of 1399 trios, genotyped over 1072820 SNPs, using the Illumina technology. 18,946 genes are defined using the Illumina-based assignment. We cluster the probands based on genetic similarity in two steps: 1) run Eigensoft to obtain PCs, and 2) apply the Hopach algorithm to PCs. We then proceed to select genes in four steps: 1) apply a family-based TDT test (Plink) to each cluster-based population, 2) rank each gene by TDT p-value, 3) select the top 1000 ranked SNPs in each population, 4) select genes with top ranks SNPs in multiple populations (R). Currently we are analyzing the haplotypes of relevant gene's regions (Beagle) and investigating their differential distribution in parents vs. children (R).
Results: Our method produces eight clusters; the individuals recruited in Portugal are mostly placed into one cluster; the individuals recruited in Ireland are placed into its own cluster. This was expected since these two populations are historically the most isolated of the AGP sites. Self-reported ancestry validates the clustering results. An FST analysis for all pairs of clusters supports the results. One hundred and seventy genes were selected because:1) two or more clusters have one single SNP ranking in the top 1000 TDT-p values; we select the gene corresponding to this SNP, 2) four or more clusters have SNPs ranking in the top 1000 TDT p-values in any SNP mapping to the gene, and 3) three or more clusters have SNPs ranking in the top 1000 p-values, in a region we have defined as contiguous-high-ranked-SNPs. Included among the selected genes are axon guidance, CNS development, and autism related genes. Currently we are thoroughly analyzing the genes, investigating haplotype differential distribution in parents vs. children.
Conclusions: A Gene-centric approach and factoring genetic background into GWAS analysis uncovers genes which might be associated with autism disease; these genes would not be selected by the most commonly used TDT SNP-centric approach.
Objectives: To find genes associated with autism, applying a Gene-centric approach to the AGP data and factoring the genetic background of the individuals into the analysis.
Methods: The AGP data set consists of 1399 trios, genotyped over 1072820 SNPs, using the Illumina technology. 18,946 genes are defined using the Illumina-based assignment. We cluster the probands based on genetic similarity in two steps: 1) run Eigensoft to obtain PCs, and 2) apply the Hopach algorithm to PCs. We then proceed to select genes in four steps: 1) apply a family-based TDT test (Plink) to each cluster-based population, 2) rank each gene by TDT p-value, 3) select the top 1000 ranked SNPs in each population, 4) select genes with top ranks SNPs in multiple populations (R). Currently we are analyzing the haplotypes of relevant gene's regions (Beagle) and investigating their differential distribution in parents vs. children (R).
Results: Our method produces eight clusters; the individuals recruited in Portugal are mostly placed into one cluster; the individuals recruited in Ireland are placed into its own cluster. This was expected since these two populations are historically the most isolated of the AGP sites. Self-reported ancestry validates the clustering results. An FST analysis for all pairs of clusters supports the results. One hundred and seventy genes were selected because:1) two or more clusters have one single SNP ranking in the top 1000 TDT-p values; we select the gene corresponding to this SNP, 2) four or more clusters have SNPs ranking in the top 1000 TDT p-values in any SNP mapping to the gene, and 3) three or more clusters have SNPs ranking in the top 1000 p-values, in a region we have defined as contiguous-high-ranked-SNPs. Included among the selected genes are axon guidance, CNS development, and autism related genes. Currently we are thoroughly analyzing the genes, investigating haplotype differential distribution in parents vs. children.
Conclusions: A Gene-centric approach and factoring genetic background into GWAS analysis uncovers genes which might be associated with autism disease; these genes would not be selected by the most commonly used TDT SNP-centric approach.