I am active in the development of multimodal speech enhancement. This is working towards a new direction of hearing aid research, with the concept of using cameras as part of a future listening device. This involves automatic Region of Interest tracking to extract lip information, and then using this to process noisy audio information. I also utilise conventional audio-only speech filtering algorithms to reduce noise in speech. An up to date (August 2013) poster can be viewed by clicking on this link.
I am very interested in further research that can improve the visual filtering approach, and would be particularly interested in collaboration to improve the visual processing side (ROI detection and tracking, visual voice activity detection), and also the audiovisual speech modelling aspect. To do this, we need to be able to predict the audio output based on equivalent visual information.
I carried out this research under the supervision of Prof. Amir Hussain, and continue to work with him on future projects as a member of the COSIPRA Laboratory. More information about my research can be found in my PhD thesis (see below), and is linked to work as part of the COSIPRA Laboratory. We will be further developing this with the funding of a new EPRSC research grant focusing on this work.
I was employed as a Researcher on the MEMS/CMOS microphone project (EPSRC Grant EP/G062609/1) between 2012 and 2014, initially testing and evaluating new types of microphone. This involves designing test set ups able to detect output from experimental microphones, requiring some knowledge of hardware to construct these setups, programming in Labview, and data manipulation in MATLAB.
In addition to this, we also developed a speech segmentation approach while investigating sound features, producing a paper on speech segmentation, which can be found in the publications section.
In addition to the audiovisual approach detailed above, I also have an interest in how the human brain handles audiovisual processing in order to detect, and hear speech. This includes auditory illusions such as the McGurk effect, visual cues to improve speech perception, lip-reading, and attention shifting to make optimal use of the input modalities. Of particular interest is how to transfer this input information into cognitively inspired speech filtering systems.
I have developed a system to take a nuanced approach to audiovisual speech processing that will use visual information, but only when it is considered to be appropriate and visual information is available. This considers the level of the audio input and the quality of visual information, and then adjusts the processing decision on a frame by frame basis to produce cognitively inspired filtered speech.
I have given a number of presentations concerning cognition, most recently two keynote presentations at IJCNN 2013. Slides are available at:
Cognitive Computation: A Case Study in Cognitive Control of Autonomous Systems and Some Future Directions (powerpoint). This focuses on autonomous vehicles and how it can tie into the creation of a cognitive being.
Cognitive Computation(powerpoint). This focuses on different cognitively inspired systems, with a particular emphasis on the cognitive aspects of speech.
As part of recent research, an audiovisual speech corpus containing challenging noisy audio and visual data was created. This corpus is now available for download. Please visit the ChallengAV page: ChallengAV
Towards an Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System. This thesis presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged autonomous, adaptive, and context aware multimodal system.
My full list of publications can be found on my publications page.