Center for Statistical Science
Abstract: An important question in social network analysis is whether observed association in individual characteristics between connected individuals in a network is a consequence of social influences (i.e., forces that act once the tie is formed) or whether individuals who are similar (dissimilar) are more likely to form (break) ties. The latter mechanism is known as homophily and is commonly described as "birds of a feather flock together." In this talk we describe an approach for estimating the magnitude of the homophily effect, and examine the extent to which ties in a social network form or dissolve as a function of the similarity (or lack thereof) of individuals' health-related traits. We also investigate whether the health traits have greater effects than the non-health "unchangeable" traits. If so, this would suggest that those traits which are affected by sociological phenomena have a greater influence on the dynamic behavior of the network. The health behaviors and traits considered are: Body mass index (BMI), smoking, drinking, exercise, depression, and hypertension. For comparison, we also consider two non-health traits, height and handedness (left or right-handed) that are immutable.
Center for Computational Molecular Biology
Abstract: Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Coupled with well established high-throughput proteomics workflows, tandem mass spectrometry search engines make identifying the major constituent proteins in clinical samples straightforward. Driven by increasingly sensitive protein chemistry protocols and mass spectrometers, and a new perspective on the importance of alternative splicing and coding SNP protein isoforms, however, the shortcomings of the existing tools are becoming more and more apparent. We use a variety of computational techniques to improve the reliability of peptide identification analyses, as we seek to address the limitations of current tandem mass spectrometry search tools. First, we aggressively enumerate an inclusive set of potential peptide sequences from transcript evidence, particularly ESTs, to ensure that evidence of novel, unexpected, or unannotated protein isoforms is not missed by tandem mass spectrometry search engines. We use a novel compression technique to ensure that the resulting sequence database can be searched quickly and easily using existing tools, and demonstrate that novel peptides, representing coding SNPs, alternative splicing, and novel mutations, can be observed in publicly available datasets. Second, we apply hidden Markov models to spectral matching of tandem mass spectra of previously identified peptides, improving on the sensitivity and specificity of peptide identification by sequence database search engines and traditional spectral matching techniques. Lastly, we post-process search engine peptide identification results using an unsupervised, model-free, result-combining machine-learning approach that achieves superior sensitivity and specificity than either result combining or machine learning alone. Using this technique on datasets derived from standard protein mixtures, we demonstrate that the performance of the commercial search engine Mascot can be bested by combining the results of two open-source search engines, X!Tandem and OMSSA; but that using all three search engines is better still. Such a reliable, sensitive, and specific peptide identification analysis platform has the potential to not only explore a largely untapped source of potential cancer biomarkers from clinical cancer samples and cancer cell-lines, but also to inform functional genomics and genome annotation. The characterization of expressed proteins using tandem mass-spectrometry provides direct evidence for the amino-acid sequence of functional proteins and their isoforms, evidence which is not available using other high-throughput experimental techniques. We conclude with a discussion of unconventional experimental workflows for peptide identification and their potential to inform functional genome annotation.
PDE Seminar
<--- 2008 Index