Meta-Learning for Data and Processing Efficiency

Ravi, Sachin

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp013j333513x

Title:	Meta-Learning for Data and Processing Efficiency
Authors:	Ravi, Sachin
Advisors:	Li, Kai
Contributors:	Computer Science Department
Keywords:	Deep Learning Meta-Learning
Subjects:	Artificial intelligence Computer science
Issue Date:	2019
Publisher:	Princeton, NJ : Princeton University
Abstract:	Deep learning models have shown great success in a variety of machine learning benchmarks; however, these models still lack the efficiency and flexibility of humans. Current deep learning methods involve training on a large amount of data to produce a model that can then specialize to the specific task encoded by the training data. Humans, on the other hand, are able to learn new concepts throughout our lives with comparatively little feedback. In order to bridge this gap, previous work has suggested the use of meta-learning. Rather than learning how to do a specific task, meta-learning involves learning how-to-learn and utilizing this knowledge to learn new tasks more effectively. This thesis focuses on using meta-learning to improve the data and processing efficiency of deep learning models when learning new tasks. First, we discuss a meta-learning model for the few-shot learning problem, where the aim is to learn a new classification task having unseen classes with few labeled examples. We use a LSTM-based meta-learner model to learn both the initialization and the optimization algorithm used to train another neural network and show that our method compares favorably to nearest-neighbor approaches. The second part of the thesis deals with improving the predictive uncertainty of models in the few-shot learning setting. Using a Bayesian perspective, we propose a meta-learning method which efficiently amortizes hierarchical variational inference across tasks, learning a prior distribution over neural network weights so that a few steps of gradient descent will produce a good task-specific approximate posterior. Finally, we focus on applying meta-learning in the context of making choices that impact processing efficacy. When training a network on multiple tasks, we have a choice between interactive parallelism (training on different tasks one after another) and independent parallelism (using the network to process multiple tasks concurrently). For the simulation environment considered, we show that there is a trade-off between these two types of processing choices in deep neural networks. We then discuss a meta-learning algorithm for an agent to learn how to train itself with regard to this trade-off in an environment with unknown serialization cost.
URI:	http://arks.princeton.edu/ark:/88435/dsp013j333513x
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Ravi_princeton_0181D_13033.pdf		2.24 MB	Adobe PDF	View/Download

Show full item record

Search

Browse