David Nicholson

Tuesday, April 23rd, 2019
Monday, November 26th, 2018

Thrillington: lessons learned from replicating a recurrent model of visual attention

Neural networks for computer vision process an entire image in parallel, but humans and many other animals fixate on a series of points in a scene as they scan it for information. This act of actively sampling the visual environment is called “overt attention”, and it has inspired several threads of artificial intelligence research. One example can be found in the paper Recurrent Models of Visual Attention (RAM) by Mnih et al. from 2014 (https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention). The RAM model is a recurrent network trained in part with reinforcement learning to integrate a series of small “glimpses” that it uses to classify images. In this talk I will discuss my results from implementing this model (https://github.com/NickleDave/thrillington). I will begin with a brief introduction to the stochastic units used in the RAM model and how they can be trained with the REINFORCE algorithm. Along the way I’ll point out issues with some extant publicly-available implementations and some “tricks of the trade” for training reinforcement learning agents. Then I will provide an analysis of the “behavior” of this model and how that behavior depends on its hyperparameters. Lastly I’ll talk about work in progress benchmarking this model and others (https://github.com/NickleDave/aver) with a visual search task (https://github.com/NickleDave/searchstims).

A comparative study of neural network architectures for segmentation: finding syllables in birdsong

Vocalizations such as speech can be segmented into units like words or phonemes. Neural networks for speech to text typically avoid segmenting into units of time, instead mapping directly from the input features to a sequence of text. Neglecting the segmentation problem allows these networks to achieve high accuracy. However there are many cases where it is desirable to find segments, such as diagnosis of speech disorders.

Another case where it is desirable to find segments is in birdsong. Songbirds learn their songs as juveniles from adult tutors, much like a baby learns to talk from its parents. Many neuroscientists study songbirds to understand how brains learn and produce speech and similar motor skills like playing the piano. However the field has been held back by the inability to fully automate the process of segmenting song into its elements, often called syllables, and then label those syllables. Because of this bottleneck, often only a small portion of the thousands of songs collected in a behavioral experiment can be analyzed.

Previously, my collaborators and I have shown that birdsong provides a good testbed for networks that segment vocalizations, and that a hybrid convolutional-recurrent neural network can outperform previously proposed architectures that segment birdsong into syllables.

Video

Github Page

However it remains unclear whether networks for segmentation require recurrent connections, or whether alternatively fully convolutional architectures can recover segments. It is also unclear how robust the different architectures are to noise. Initial results suggest that two types of convolutional networks (encoder-decoder and dilated convolutional) yield segmentation as good as a network with a recurrent connection. Furthermore, the convolutational networks can be trained in about a third of the time. Experiments in progress test how robust these networks are to the presence of noise. I will discuss whether recurrent connections are always advantageous in neural networks or whether convolutional architectures can always be competitive, provided the input can be represented as an image.

BIO

David Nicholson (https://nicholdav.info/) is a neuroscientist at Emory University in Atlanta, Georgia. He works for the Prinz lab in the Biology department, developing brain-inspired continual machine learning algorithms as a member of a multi-university team on a DARPA project. He also works in applied machine learning in the area of animal vocalizations. In collaboration with Yarden Cohen, he developed a library to help researchers use neural networks for automated segmentation and annotation of vocalizations (https://github.com/NickleDave/vak). They have used this library to benchmark the first neural net architecture capable of accurately segmenting and labeling syllables in hundreds of hours of complex birdsong, such as that of the canary (https://github.com/yardencsGitHub/tweetynet). These projects began during his graduate studies in Sam Sober’s lab at Emory, where his dissertation work showed that connections known to be important for learning motor skills in humans and other mammals are also found in regions of the songbird brain that are required to learn song. David also maintains several Python packaged and tools related to his research (more at https://github.com/NickleDave/MetaNickleDave).