Sounds in real environments arise from many concurrent sound sources, with reverberation induced smearing changing their spectrotemporal content. Eventually, sound (including speech) arrives at the ear or at the microphone as a time-varying signal, with the components of interest being between about 20 and 15,000 Hz. This is, of course, a mix of all the sound sources, all smeared by multiple reflections. How this signal should be processed to provide input to an interpreting system (which presumably would prefer to interpret only the signal of interest) is very much a matter of debate, particularly between those who use traditional MFCC techniques, and those who prefer something more neurally inspired, like a set of spike trains, and perhaps feature detectors as well. How should the sounds from a particular source of interest be segregated? How should interpretation be made invariant under listening conditions? Whatever techniques are used, the result is a time series of some type, and there are many neural techniques of possible interest for different aspects of this problem, from deep neural networks to learning-based spiking neural systems.
This session builds on the IJCNN Special Session in 2011, organised by the late Harry Erwin.
|Paper submission:||January 15th, 2015 Now February 5 2015!|
|Paper Decision notification:||March 15th, 2015 Now March 25 2015!|
|Camera-ready submission:||April 15th, 2015 Now April 25 2015!|
|Conference Dates:||July 12 - 17th, 2015|
Please contact the organisers if you have any questions at all!
If you have any difficulties accessing this page, or you have any queries/suggestions arising from this page, please email: