Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01tm70mx49v
Title: An in vivo Mutation Accumulation Analysis of Saccharomyces Cerevisiae Cells Exposed to DNA Crosslinking Agents and Programs for Improving the W303 Draft Genome
Authors: Richards, Jonathan
Advisors: Gammie, Alison
Department: Computer Science
Class Year: 2015
Abstract: Humans are constantly exposed to crosslinking agents such as cisplatin, a common chemotherapy agent, and ultraviolet B radiation (UVB), which is present in natural light. Xeroderma pigmentosum (XP) patients are especially vulnerable to such crosslinking agents, as they lack the nucleotide excision repair (NER) pathway. Saccharomyces cerevisiae is a good model organism for humans, as methods of DNA crosslink repair, including NER and homologous recombination repair (HRR), are highly conserved. Additionally, RAD14 defective strains can be used to model certain types of XP. We use the W303 and S288C strains of S. cerevisiae as model organisms to characterize mutations caused by cisplatin and UVB in vivo. Both haploid and diploid strains were exposed to UVB or cisplatin for 80 generations. Full genome sequencing revealed 992 point mutations in UVB-treated strains and 185 point mutations in cisplatin-treated strains. Motifs were generated using MEME. Six motifs containing of UVB-induced mutations were found, primarily indicating damage around TpC and TpTpA sequences. Two motifs containing 104 cisplatin-induced mutations were found, but targeted sequences were not readily apparent. Very few point mutations occurred within coding sequences, as expected. Full genome sequencing also revealed losses of heterozygosity in diploid strains, which occur during HRR.High-Throughput Sequencing is a relatively novel concept that has allowed us to study mutations at the genomic level in vivo. We introduce several programs to assist in genome-wide mutation analysis and demonstrate their effectiveness in the analysis of mutation accumulation data. The first program, “TRTR: Trim Reads of Tandem Repeats”, is a program for preprocessing DNA reads such that the only reads aligned to short tandem repeat (STR) regions are reads that span the region, making automated mutation calling more accurate. The second program, “FastNW: Fast Needleman- Wunsch”, is a Python module for computing pairwise Needleman-Wunsch alignments significantly faster and with vastly improved memory management as compared to existing alternatives. Finally, we make available a pipeline for constructing reference genomes using sequencing reads and a reference genome for a similar organism, taking advantage of the regions of similarity between the two organisms to eliminate reads from the assembly. We tested this pipeline by building a reference genome for W303 from short (100bp) Illumina sequencing reads and the S288C reference genome. Initially, 85% of W303 reads aligned perfectly to S288C compared to 97% of S288C reads. Our pipeline produced a reference genome to which 91% of W303 reads aligned.
Extent: 62 pages
URI: http://arks.princeton.edu/ark:/88435/dsp01tm70mx49v
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Computer Science, 1987-2023

Files in This Item:
File SizeFormat 
PUTheses2015-Richards_Jonathan.pdf5.15 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.