Current Projects

Spike-based coding of temporal signals
Non-stationary acoustic features provide essential cues for many auditory tasks including sound localization, auditory stream analysis, and speech recognition. These features can be best characterized relative to a precise point in time such as the onset of a sound or the beginning of a harmonic periodicity. Extracting this structure with standard frame-based signal analysis methods, however, is difficult due to the sensitivity of the representation to the arbitrary alignment of the frames. Convolutional techniques such as shift-invariant transformations can reduce this sensitivity, but these do not yield a code that is efficient, i.e. one that forms a non-redundant representation of the underlying structure. We have developed a non-frame based method for signal representation that is both time-relative and efficient. Signals are represented using a linear superposition of time-shiftable kernel functions each with an associated magnitude and temporal position. Signal decomposition in this method is a non-linear process that consists of optimizing the kernel function scaling coefficients and temporal positions to form an efficient, shift-invariant representation. This approach has direct relevance to the neural coding at the auditory nerve and the more general issue of how to encode complex, time-varying signals with a population of spiking neurons. For the complete story, check out Smith and Lewicki, (Neural Computation 2005).
Learning the Structure of Speech and Other Natural Sounds
The auditory neural code must serve a wide range of auditory tasks that require exquisite sensitivity in time and frequency and be effective over the diverse array of sounds present in natural acoustic environments. It has been suggested that sensory systems may have evolved highly efficient coding strategies in order to maximize the information conveyed to the brain while minimizing the required energy and neural resources. We find when the acoustic features in a spike code are optimized for coding either natural sounds or speech, they show striking similarities to time-domain cochlear filter estimates, exhibit a frequency-bandwidth dependence similar to that of auditory nerve fibers, and yield significantly greater coding efficiency compared to conventional signal representations. These results suggest that the auditory code approaches an information theoretic optimum and that the acoustic structure of speech is adapted to the coding capacity of the mammalian auditory system. The results from this work recently appeared in a Nature article; see also Smith and Lewicki, (NIPS 2005).
Modeling higher-order structure in natural images
Linear componential models, such as PCA and ICA, widely used for learning the statistical structure in natural images, form distributed representations that are well suited to high dimensional data. However, because these models are linear and assume fixed statistics over the entire ensemble of the data, they are limited in the type of structure they can represent. In fact, while the goal of these models is to separate out the independent components in a signal, the resulting representation often exhibits residual dependencies and non-stationary statistics.

We proposed a generalization of the linear generative model in which we abandoned the assumption of independent linear basis function coefficients. Instead, we modeled the dependence among their variances with a non-linear "basis" comprised of a set of density components. This hierarchical structure allowed us to model the observed magnitude dependence and also described a non-stationary density that can change from sample to sample (i.e. the model infers, for each data point, a distribution most likely to have generated that point). The higher-order parameters (the density components) form a distributed representation of higher-order statistical regularities, and are learned directly from the data. The latent variables capture more complex image structure, and hence vary more slowly over the image than the filter outputs, leading to a more invariant representation (see image on the right). More info in Karklin and Lewicki (Neural Computation 2005).
Is early vision optimized for higher-order dependencies?
ICA and sparse coding assume a fairly simple model for natural images and yield a set of filters that qualitatively resemble V1 receptive fields. However, the match between the population properties of V1 neurons and the derived filters is not great -- the algorithms produce a homogeneous population of filters that tend to cluster at high spatial frequencies. We used the hierarchical model described above, but this time learned the lower level representation (i.e. the filters) at the same time as the higher-order model parameters and compared them to the filters derived by ICA and sparse coding. The learned filters were more varied and spanned a wider range of spatial frequencies, even though the marginal probability densities for the filter outputs assumed under the two models are similar. This suggests that V1 receptive fields function not only to make their outputs as indepedent as possible, but also to facilitate the processing of higher-order statistical regularities. For more info, check out our paper, Karlin and Lewicki (NIPS 2006).
Robust coding for a noisy neural population
How much information about the signal can be represented when the code is noisy and hence its precision is limited? This is a common practical concern and particularly relevant to biological neural representations because its coding precision is as limited as a few bits per spike. We proposed a robust coding model that yields the optimal linear encoder and decoder adapted to the training data set. It can arbitrarily reduce the residual error by employing a large number of coding units and forming the optimal population code. This proposed coding method was shown to outperform conventional image coding methods such as wavelet, ICA, and PCA. For details, see Doi, Balcan, and Lewicki (NIPS 2006).
Theoretical analysis of robust coding
We mathematically analyzed the robust coding model and characterized the optimal solutions. More specifically, we derived the error bound that can be achieved by any linear encoder and decoder. Also we invented a diagram that graphically proves whether a given code satisfies the sufficient conditions for the optimal code. Our analysis revealed that a degenerate code can best preserve signal information at the expense of ignoring the minor data component when the representational capacity is too limited. Read more about this in Doi, Balcan, and Lewicki (NIPS 2006).


Robust and efficient coding
We extended robust coding so that it exhibits coding efficiency while maintaining the reconstruction accuracy. Without this additional constraint, the encoding vectors of robust coding do not have any clear structure. When the coding efficiency constraint is included, they become spatially localized and similar to Simple-cell receptive fields found in the primary visual cortex. Some of the results were presented in Doi and Lewicki (NIPS 2005).
Intrinsic Structures of Impact Sounds
Models of sounds have proven useful in many fields, such as sound synthesis, sound recognition and identification of events or properties (like material or length) of the objects involved. However, developing such models is hard due to all the complexities of real sounds.

Natural sounds of the same type have a rich variability in their acoustic structure. For example, different impacts on the same rod can generate very different acoustic waveforms. In natural environments there is variability due to reverberation and background noise, but even when the sounds are recorded in anechoic conditions there is variability that is due to factors such as the slight variations in the impact force and location. (For instance, the figure on the left shows that, even though different impacts on the same rod have very similar spectra, the relative power and duration of the partials varies from one instance to the other. These differences cannot be explained by a simple variation in amplitude.) In spite of these variations, when the sounds are heard they are often perceived as almost identical, meaning that they have some common intrinsic structures.

We are developing data-driven methods for learning the intrinsic features that govern the acoustic structure of impact sounds. These methods require no a priori knowledge of the physics, dynamics and acoustics, and are used to create models of impact sounds that represent a rich variety of structure and variability in the sounds.
Environmental sounds recognition
Environmental sound recognition systems are intended to distinguish different categories of sounds, where sounds from different categories usually have very different spectral and temporal characteristics. A typical example of such categories is: door bells, waves, dog barking, whistle, footsteps, keyboard, etc.. These sounds are not only produced by different types of objects but also by different types of events. We have been investigating the possibility of building environmental sound recognizers that differ from the recognizers described above as they are intended to distinguish sounds produced by very similar objects and by the same type of event.