Two-channel mixtures of speech and real-world background noise
Introduction
In recent years, many recording devices such as voice recorders, smart phones, tablet-type mobile devices, laptop PCs, etc, can be available in our surrounding environment easily, and exploiting them for array signal processing is one of the attractive scenarios. However, in most cases, recorded signals with different devices are not synchronous, which include unknown time offsets of recording start or sampling frequency mismatch. The aim of this task to evaluate source separation on such asynchronous channles.
Description of the datasets
The datasets consist of synthetic asynchronous recordings of speech mixtures with three stereo recording devices which are not synchronized. Recording of static sources with synchronous channels are simulated by convolution of measured impulse responses and imposition of uncorrelated white noise whose SNR is 60 dB. Then random time offsets and slight sampling frequency mismatches were artificially applied to them.
The data has two different recording environments:
- 150ms: all the microphone elements are spaced in a linear arrangement. The spacing of each stereo microphone pair is about 2.15 cm. The reverbaration time is about 150 ms.
- 300ms: all the microphone elements are spaced in a radial fashion. The spacing of each stereo microphone pair is about 7.65 cm. The reverbaration time is about 300 ms.
Test data
Download test.zip (18.8 MB)
The data consist of 18 stereo WAV audio files that can be imported in Matlab using the wavread command. These files are named
test_<srcset>_<cond>_mix_<ch>.wav, where
- <srcset>: source sets male2, male3 and male4, which correspond to the mixture of two, three and four male skerkers' utterrances, respectively.
- <cond>: the recording conditions 150ms and 300ms.
- <ch>: the indexes of the stereo channels ch12, ch34 and ch56. The channels are synchronized within each file, but the files are not synchronized to each other.
Each combination of
<srcset> and
<cond> determines one source set. The source sets do not share the same time offsets, sampling frequency mismatches and the direction of the sources. The sampling frequency mismatches are samaller than 100 ppm (= 0.01 %).
Development data
Download dev.zip (75.5 MB)
The developement data consist of 66 stereo WAV audio files and 6 Matlab MAT files, which can be imported in Matlab using the commands load and wavread respectively. These files are named as follows:
- dev_src_<src>.wav: single-channel speech signal, shared in whole the development data.
- dev_<srcset>_<cond>_<src>_<ch>.wav: two-channel spatial image of each source.
- dev_<srcset>_<cond>_mix_<ch>.wav: two-channel observed signal of each stereo channel pair.
- dev_<srcset>_<cond>_src_<src>.wav: MAT file including the variable A of the room impulse responses, whose size is [number of the channels, number of the sources, number of samples]. Note that the recording time offset is included in the impulse responses.
Here the variables are determined as follows.
- <srcset>: source set male2, male3 and male4, which correspond to the mixture of two, three and four male skerkers' utterrances.
- <cond>: recording conditions 150ms and 300ms.
- <ch>: indexes of the stereo channels ch12, ch34 and ch56. The channels are synchronized within each file, but The files are not synchronized each other.
- "<src>": indexes of the source.
In this development data, the data sets share the same time offset and sampling frequency mismatch. All the channels are originally sampled at 16 kHz, and "ch34" and "ch56" are resampled at 15999 and 16001 Hz, respectively.
Task
The task is to estimate each source signal at the first channel from mixture. Because channels can include unknown offsets of recording start or sampling frequency mismatch, each participant is requested to align the separated source at the first channel.
Submission
Each participant is asked to submit the results of his/her algorithm for the task described above over all or part of the mixtures in the development dataset and the test dataset.
Evaluation criteria
We plan to use the criteria defined in the BSS_EVAL toolbox. The submitted results will be evaluated with SDR, SIR, SAR, ISR, using original sources at the first channel as "i" in bss_eval_images_nosort.m.
The criteria and benchmarks are respectively implemented in
Licensing issues
All files are distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike
? 3.0 license. The files to be submitted by participants will be made available on a website under the terms of the same license. The author are Yuya Sugimoto and Shigeki Miyabe.
Task proposed by the Audio Committee.