Brown University -
Division of Biology and Medicine,
Center for Statistical Sciences Seminar
Department of Biological Statistics and Computational Biology, Candidate for Assistant Professor (tenure track) in the Public Health Program/Department of Community Health | |
Abstract: High-throughput biotechnologies, such as microarray and mass spectometry, simultaneously monitor the activities of thousands of genes at the RNA and protein level. Statistically, we are challenged by efficiently estimating high-dimensional parameters with noisy data. Furthermore, the signals in these large-scale analyses in genomics and proteomics are sparse and asymmetric. Here we propose a generalized shrinkage estimator based on emperical Bayesian thresholding, which is adaptive to the sparseness and possible asymmetry of the signals. The properties of this estimator have been investigated. Simulation study and application to microarray data demonstrate the performance of our approach.
Identifying polygenic effects on complex traits and profiling molecular features for clinical outcomes post another challenging statistical issue as selecting variables with large $p$ small $n$ data. Likewise, the sparseness and possible asymmetry of the signals are the most important characteristics of the large $p$ small $n$ data, which should be exploited because of the limited sample size and/or the biological implication. We develop a Bayesian model selection approach to incorporate this {\it a priori} information. A heat-map is proposed to help researchers make informed decisions and control false discovery rate. This approach has been successfully applied in revealing sex-specific QTL underlying difference in glucose-6- phosphate dehydrogenase enzyme activity between two {\it Drosophila} species.
Brown University -
Division of Biology and Medicine,
Center for Statistical Sciences Seminar
Ph.D. Candidate (Biostatistics) Candidate for Post Doctoral Research Associate in the Center for Statistical Sciences | |
Abstract: The last decade has seen an explosion of interest in disease mapping, with increasing availability of Geographic Information System (GIS) technology and spatial databases. For example, the databases from the National Center for Health Statistics (NCHS) or from the Surveillance, Epidemiology, and END Results (SEER) program of the National Cancer Institute, publicly available to anyone with a web browser, provide an enormous supply of georeferenced data. Conditionally autoregressive (CAR) models (Besag et al., 1991) have been widely used for single disease mapping with such data. But when we simultaneously map multiple diseases, a multivariate areal model may be needed to permit modeling of dependence between diseases while maintaining spatial dependence across regions. Existing methods for multivariate areal data (see, e.g. Kim et al., 2001; Carlin and Banerjee, 2003; Gelfand and Vonatsou, 2003) typically suffer from unnecessary restrictions on the covariance structure. In this talk, we propose a class of Bayesian hierarchical models for multivariate areal data that avoids these restrictions, permitting flexible modeling of correlations both between diseases and across areal units. Our framework encompasses a rich class of multivariate conditionally autoregressive (MCAR) models that are computationally feasible via modern Markov chain Monte Carlo (MCMC) methods. We illustrate the strengths of our approach over existing models using simulation studies, and also offer a real-data application to disease mapping which involves annual lung, larynx, and esophagus cancer death rates in Minnesota counties between 1990 and 2000.
NOTE: This work is joint with Bradley P. Carlin and Sudipto Banerjee of the Division of Biostatistics, University of Minnesota.
Brown University -
Division of Biology and Medicine,
Center for Statistical Sciences Seminar
Department of Biochemistry and Molecular Medicine Candidate for Assistant Professor (tenure track) in the Public Health Program/Department of Community Health | |
Abstract: Given a large number of predictors in a regression, it is often desirable to reduce the dimensionality of the problem by replacing the original high-dimensional data with a low-dimensional space composed of a few key predictors or linear combinations of predictors. In this talk, I will first introduce the general framework of sufficient dimension reduction (SDR), which targets the reduction of dimension without losing any information on the conditional distribution of response given predictors, and without pre-specifying any parametric model. Two specific works within the SDR framework will be examined. The first is a model-free variable selection approach, which identifies contributing predictors prior to any model formulation. The second is the application of SDR to a microarray survival data analysis, where the goal is to predict the patients' survival time using gene expression profiles. Some related SDR methodological work and genomic studies will also be briefly reviewed.
Brown University - Division of Biology and Medicine
Center for Statistical Sciences Seminar
Cornell University Candidate for Assistant Professor (tenure track) in the Public Health Program/Department of Community Health | |
Structured Covariance Matrices | |
Abstract: I will provide a general reference prior approach for linear models with a variety of covariance structures as well as an all-purpose MCMC algorithm for its implementation. Several reference priors developed in the literature are shown to be special cases of the proposed reference prior. This general reference prior can be used to construct inferences for penalized smoothing splines, carry out Bayesian analysis of Gaussian graphical models, address various issues in linear mixed models, etc. We found a unique shrinkage property makes the reference prior approach a particularly good default prior choice for penalized spline smoothing. I will also show the use of the proposed reference prior for inference for generalized linear mixed models.
PDE Seminar
<--- 2005 Index