![]() Spike-based coding of temporal signals
Non-stationary acoustic features provide essential cues for many
auditory tasks including sound localization, auditory stream analysis,
and speech recognition. These features can be best characterized
relative to a precise point in time such as the onset of a sound or
the beginning of a harmonic periodicity. Extracting this structure
with standard frame-based signal analysis methods, however, is
difficult due to the sensitivity of the representation to the
arbitrary alignment of the frames. Convolutional techniques such as
shift-invariant transformations can reduce this sensitivity, but these
do not yield a code that is efficient, i.e. one that forms a
non-redundant representation of the underlying structure. We have
developed a non-frame based method for signal representation that is
both time-relative and efficient. Signals are represented using a
linear superposition of time-shiftable kernel functions each with an
associated magnitude and temporal position. Signal decomposition in
this method is a non-linear process that consists of optimizing the
kernel function scaling coefficients and temporal positions to form an
efficient, shift-invariant representation. This approach has direct
relevance to the neural coding at the auditory nerve and the more
general issue of how to encode complex, time-varying signals with a
population of spiking neurons. For the complete story, check out Smith and Lewicki,
(Neural Computation 2005).
|
![]() Learning the Structure of Speech and Other Natural Sounds
The auditory neural code must serve a wide range of auditory tasks that
require exquisite sensitivity in time and frequency and be effective over
the diverse array of sounds present in natural acoustic environments. It has
been suggested that sensory systems may have evolved highly efficient coding
strategies in order to maximize the information conveyed to the brain while
minimizing the required energy and neural resources. We find when the
acoustic features in a spike code are optimized for coding either natural
sounds or speech, they show striking similarities to time-domain cochlear
filter estimates, exhibit a frequency-bandwidth dependence similar to that
of auditory nerve fibers, and yield significantly greater coding efficiency
compared to conventional signal representations. These results suggest that
the auditory code approaches an information theoretic optimum and that the
acoustic structure of speech is adapted to the coding capacity of the
mammalian auditory system.
The results from this work recently appeared in a
Nature article; see also
Smith and Lewicki, (NIPS 2005).
|
![]() Modeling higher-order structure in natural images
![]() We proposed a generalization of the linear generative model in which we abandoned the assumption of independent linear basis function coefficients. Instead, we modeled the dependence among their variances with a non-linear "basis" comprised of a set of density components. This hierarchical structure allowed us to model the observed magnitude dependence and also described a non-stationary density that can change from sample to sample (i.e. the model infers, for each data point, a distribution most likely to have generated that point). The higher-order parameters (the density components) form a distributed representation of higher-order statistical regularities, and are learned directly from the data. The latent variables capture more complex image structure, and hence vary more slowly over the image than the filter outputs, leading to a more invariant representation (see image on the right). More info in Karklin and Lewicki (Neural Computation 2005). |
![]() Is early vision optimized for higher-order dependencies?
ICA and sparse coding assume a fairly simple model for natural images
and yield a set of filters that qualitatively resemble V1 receptive
fields. However, the match between the population properties of V1
neurons and the derived filters is not great -- the algorithms produce
a homogeneous population of filters that tend to cluster at high
spatial frequencies. We used the hierarchical model described above,
but this time learned the lower level representation (i.e. the
filters) at the same time as the higher-order model parameters and
compared them to the filters derived by ICA and sparse coding. The
learned filters were more varied and spanned a wider range of spatial
frequencies, even though
the marginal probability densities for the filter outputs assumed
under the two models are similar. This suggests that V1 receptive
fields function not only to make their outputs as indepedent as
possible, but also to facilitate the processing of higher-order
statistical regularities. For more info, check out our paper, Karlin and Lewicki (NIPS
2006).
|
![]() Robust coding for a noisy neural population
How much information about the signal can be represented when the code
is noisy and hence its precision is limited? This is a common
practical concern and particularly relevant to biological neural
representations because its coding precision is as limited as a few
bits per spike. We proposed a robust coding model that
yields the optimal linear encoder and decoder adapted to the training data
set. It can arbitrarily reduce the residual error by employing
a large number of coding units and forming the optimal population
code. This proposed coding method was shown to outperform conventional
image coding methods such as wavelet, ICA, and PCA. For details, see Doi, Balcan, and Lewicki
(NIPS 2006).
|
![]() Theoretical analysis of robust coding
We mathematically analyzed the robust coding model and characterized
the optimal solutions. More specifically, we derived the error bound
that can be achieved by any linear encoder and decoder. Also we
invented a diagram that graphically proves whether a given code
satisfies the sufficient conditions for the optimal code. Our
analysis revealed that a degenerate code can best preserve signal
information at the expense of ignoring the minor data component when
the representational capacity is too limited. Read more about this in Doi, Balcan, and Lewicki
(NIPS 2006).
|
![]() ![]() Robust and efficient coding
We extended robust coding so that it exhibits coding efficiency while
maintaining the reconstruction accuracy. Without this additional
constraint, the encoding vectors of robust coding do not have any
clear structure. When the coding efficiency constraint is included,
they become spatially localized and similar to Simple-cell receptive
fields found in the primary visual cortex. Some of the results were
presented in Doi and Lewicki
(NIPS 2005).
|
![]() Intrinsic Structures of Impact Sounds
Models of sounds have proven useful in many fields, such as sound
synthesis, sound recognition and identification of events or
properties (like material or length) of the objects involved. However,
developing such models is hard due to all the complexities of real
sounds.
Natural sounds of the same type have a rich variability in their acoustic structure. For example, different impacts on the same rod can generate very different acoustic waveforms. In natural environments there is variability due to reverberation and background noise, but even when the sounds are recorded in anechoic conditions there is variability that is due to factors such as the slight variations in the impact force and location. (For instance, the figure on the left shows that, even though different impacts on the same rod have very similar spectra, the relative power and duration of the partials varies from one instance to the other. These differences cannot be explained by a simple variation in amplitude.) In spite of these variations, when the sounds are heard they are often perceived as almost identical, meaning that they have some common intrinsic structures. We are developing data-driven methods for learning the intrinsic features that govern the acoustic structure of impact sounds. These methods require no a priori knowledge of the physics, dynamics and acoustics, and are used to create models of impact sounds that represent a rich variety of structure and variability in the sounds. |
![]() Environmental sounds recognition
Environmental sound recognition systems are intended to distinguish
different categories of sounds, where sounds from different categories
usually have very different spectral and temporal characteristics. A
typical example of such categories is: door bells, waves, dog barking,
whistle, footsteps, keyboard, etc.. These sounds are not only produced
by different types of objects but also by different types of
events. We have been investigating the possibility of building
environmental sound recognizers that differ from the recognizers
described above as they are intended to distinguish sounds produced by
very similar objects and by the same type of event.
|