Stochastic Systems Seminar
Brown Applied Mathematics Pattern Theory and Vision Seminar
Abstract: Dimension reduction methods based on Singular Value Decomposition have been popularized in information retrieval under the title of Latent Semantic Analysis (LSA). LSA has been used as a versatile tool for problems ranging from ad hoc retrieval and information filtering to text categorization and language modeling. One of the main conceptual drawback of LSA is the use of a squared error loss function which is inadequate for discrete data like the ones arising in information retrieval. This talk presents a more rigorous approach to dimension reduction for discrete data with statistical subfamily models. The latter is based on a simple latent class model which can be fitted with a standard Expectation-Maximization algorithm. The talk will also discuss an information-geometric approach to derive similarity functions from parametric models as well as the use of annealing techniques to improve generalization performance. Experiments in automated indexing, language modeling, and text categorization confirm the advantages of our approach.
Brown Analysis Seminar
PDE Seminar
Department of Mathematics Colloquium
<--- 1999 Index