Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01hd76s315r
Title: Bayesian Modeling of Single Cell Expression
Authors: Verma, Archit
Advisors: Engelhardt, Barbara E
Contributors: Chemical and Biological Engineering Department
Keywords: Computational Biology
Machine Learning
Point Process
RNA-seq
Single Cell RNA-seq
Subjects: Bioengineering
Statistics
Applied mathematics
Issue Date: 2021
Publisher: Princeton, NJ : Princeton University
Abstract: Researchers interested in RNA expression now have tools to interrogate expression at the single cell level, providing a window into the fundamental unit of biology. RNA expression of different genes can be captured across cell populations, across spatial locations, and across time; however, extracting accurate and actionable insights from these novel data sources requires proper statistical analysis. Modeling these data can be difficult due to the stochasticity of biological processes and measurement techniques.Bayesian probabilistic models are ideal candidates for analyzing these data. Generative models account for the stochastic behavior of biological systems and measurement error inherent to experimental data. This thesis leverages Bayesian models to better understand gene expression in individual cells. We leverage three stochastic processes, Dirichlet processes, Gaussian processes and point processes, flexible priors for modeling cell behavior in different domains. These models improve on previous methods by quantifying uncertainty, preventing mode collapse, and providing more interpretable parameters. The following chapters describe the development and application of models to RNA expression data from four types of data: 1) traditional bulk RNA-sequencing (RNA-seq) expression data 2) dissociated cell sequenced by single cell RNA-sequencing (scRNA-seq), 3) in situ single cell RNA expression, and 4) spatiotemporal gene expression from fluorescence imaging. Modeling each data type requires statistical models with the appropriate assumptions. In this thesis I propose and evaluate probabilistic models for each case. I develop a class of deconvolution models to learn individual cell type expression from bulk RNA-seq. I use Gaussian processes to perform unsupervised, robust dimension reduction on high-dimensional scRNA-seq for visualization and regularization, enabling downstream analysis. I demonstrate semi-supervised learning with Gaussian processes as a powerful tool for integrating multiple single cell data across modalities, finding new insights from seqFISH+ data in conjunction with traditional RNA-seq. Finally, I use point processes to identify spatial signaling patterns from expression time series. In each case, I demonstrate how my model improves on previous methods and allows for novel insights into these data. By developing appropriate Bayesian models, this thesis demonstrates new insights from novel experimental RNA expression data that can be generalized to future experiments and technologies.
URI: http://arks.princeton.edu/ark:/88435/dsp01hd76s315r
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Chemical and Biological Engineering

Files in This Item:
File Description SizeFormat 
Verma_princeton_0181D_13697.pdf46.85 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.