Division of Computing Science and Mathematics University of Stirling

Proposed PhD project: Bio-inspired Feature Detection for Sound Streaming and Interpretation

Supervisor: Professor Leslie Smith

Aim

To develop biologically inspired techniques for the detection of spectro-temporal features in sound which may be used (i) for separating foreground sound streams from background sound streams and (ii) for the interpretation of the foreground sound streams.

Background

There is historically, a huge amount of work on sound interpretation, particularly for speech recognition. However, most of this work assumes a high (>30dB) signal: noise ratio, and therefore attempts to interpret the whole signal detected at the microphone. Unfortunately, unless the speech is being collected by a microphone directly at the speaker, this is not usually the case. In recent years there has been work on Computer Audio Scene Analysis (CASA), which attempts to separate the sound into a number of sound streams, each from a different sound source. This approach normally starts off by band pass filtering the sound into multiple channels, usually using a biologically inspired filter bank, following cochlear processing, unlike the Fourier-transform approach taken in earlier speech recognition systems. We have followed this approach, and extended it by using an (auditory nerve like) spike-based coding approach which provides precise timing and can cope with the very wide dynamic range of sound signals, and have already used onset features detected for sound source direction finding and some basic interpretation [1, 2]. Recently, we have extended the types of (proto-) feature that can be discovered by using a two dimensional spectro-temporal window operator, implemented as a set synapses (whose weights and delays encode these window operators) from the spike-coded signal to a leaky integrate and fire neuron. This approach appears promising, since proto-feature detection can be made signal level independent, and proto-features which are invariant under the usual variation in listening environments can be chosen. Further, this approach can use greedy (parallel) processing, taking advantage of modern CPUs and signal processing technologies. In addition, the Department has a SRIF funded lab which can be used for multi-stream sound acquisition. It also enables the reverberance of the environment to be varied.

Outline Programme

The research will assess proto-features and combinations of these features which are useful in foreground/background sound signal separation, and which are useful in sound (including, but not only) speech. The techniques will recode the sound as a set of sequences of proto-features (and features combined from these proto-features), and these sequences will be interpreted, and the mapping from proto-features to features adapted (for example, using neural net technologies).

Benefits

We aim to contribute to the field of interpretation of sound, where the sounds are those likely to be encountered by pervasive equipment, such as autonomous robots, or other equipment that will need to be commanded, or to interact with their (auditory) environment. This will be important for better man/machine and machine/environment interaction.

Collaborators

This work builds on existing work at Stirling. It also relates closely to work by Hussain on speech interpretation in vehicles, and has some more tenuous connection to work on the neurophysiology of hearing by Graham. It relates strongly to a project being prepared with Edinburgh University on MEMS/CMOS bandpassing microphones for sound interpretation: these projects may converge, and this would provide an excellent base for further funding.

References

1. L. S. Smith, S. Collins, Determining ITDs using two microphones on a flat panel during onset intervals with a biologically inspired spike based technique , IEEE Trans Audio, Speech and Language Processing, 15(8), 2278-2286, Nov 2007.
2. Smith, L.S. Fraser, D.S. , Robust sound onset detection using leaky integrate and fire neurons with depressing synapses, IEEE Transactions on Neural Networks, 15(5), 1125- 1134, Sept. 2004

Last updated: Tuesday, 16-Sep-2008 09:29:28 BST

If you have any difficulties accessing this page, or you have any queries/suggestions arising from this page, please email:
Prof Leslie S Smith (lss(nospam_please)@cs.stir.ac.uk)

computing logos