History: Two-channel mixtures of speech and real-world background noise
Comparing version 112 with version 123
@@ -Lines: 5-8 changed to +Lines: 5-12 @@
!!Introduction
This task aims to evaluate denoising and DOA estimation techniques by the [http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise|SiSEC 2010 noisy speech dataset].
+
+ !! Results + *Results for the [http://www.onn.nii.ac.jp/sisec13/evaluation_result/BGN/homepage_BGN_dev.html|development dataset] + *Results for the [http://www.onn.nii.ac.jp/sisec13/evaluation_result/BGN/homepage_BGN_test.html|test dataset] !!Description of the dataset @@ -Lines: 17-26 changed to +Lines: 21-30 @@
* -+Sq2+-: square 2 (another square than Sq1)
and in two different positions within each environment:
- * -+Ce+-: center (except in -+Su1+- and -+Su2+-)
* -+Co+-: corner
+ * -+Ce+-: center
* -+Co+-: corner (except in -+Su1+- and -+Su2+-)
Two recordings identified by a letter (A or B) were made in each case. Mixtures were then generated by adding a speech signal to the background noise signal. For the reverberant environments -+Su+- and -+Ca+-, the speech signals were recorded in an office room using the same microphone pair. For the outdoor environment -+Sq+-, the speech signals were mixed anechoically through simulation. The distance between the sound source and the array centroid was 1.0 m for female speech and 0.8 m for male speech. The direction of arrival (DOA) of the speech source was different in each mixture and the signal-to-noise ratio (SNR) was drawn randomly between -17 and +12 dB.
!!Test data
- __Download the [http://www.irisa.fr/metiss/SiSEC11/noise/test.zip|test set] (13 MB)__
+ __Download the [http://corpus-search.nii.ac.jp/sisec/2013/noise/test.zip|test set] (13 MB)__
The data consist of 20 stereo WAV audio files that can be imported in Matlab using the wavread command. These files are named -+test_<env>_<cond>_<take>_mix.wav+-, where @@ -Lines: 29-35 changed to +Lines: 33-39 @@
* -+<take>+-: take ( -+A+-, -+B+-)
!!Development data
- __Download the [http://www.irisa.fr/metiss/SiSEC11/noise/dev.zip|development set] (24 MB)__
+ __Download the [http://corpus-search.nii.ac.jp/sisec/2013/noise/dev.zip|development set] (11 MB)__
- The data consists of 40 WAV audio files and 10 text files. These files are named as follows:
+ The data consists of 36 WAV audio files and 10 text files. These files are named as follows:
* -+dev_<env>_<cond>_<take>_src.wav+-: single-channel speech signal
* -+dev_<env>_<cond>_<take>_sim.wav+-: two-channel spatial image of the speech source @@ -Lines: 41-47 changed to +Lines: 45-49 @@
* -+<cond>+-: recording condition ( -+Ce+-, -+Co+-)
* -+<take>+-: take ( -+A+-, -+B+-)
- Since the source DOAs were measured geometrically in the -+Su1+- and -+Ca1+- environments, they might contain a measurement error up to a few degrees; on the contrary, there is no such error in the -+Sq+- environment, because the spatial images of the speech source were simulated.
/> The mixtures dev_Ca1_Co_A_mix.wav and dev_Ca1_Co_B_mix.wav are identical (this is a mistake that will be corrected in future evaluations).
+ Since the source DOAs were measured geometrically in the -+Su1+- and -+Ca1+- environments, they might contain a measurement error up to a few degrees; on the contrary, there is no such error in the -+Sq+- environment, because the spatial images of the speech source were simulated. The -+Co+- condition of the -+Ca1+- environment has take -+A+- only.
!!Tasks and reference software @@ -Lines: 55-66 changed to +Lines: 57-60 @@
* [http://sisec2008.wiki.irisa.fr/tiki-download_file.php?fileId=9|istft_multi.m]: multichannel inverse STFT
* [http://sisec2011.wiki.irisa.fr/tiki-download_file.php?fileId=3|example_denoising.m]: TDOA estimation by GCC-PHATmax, ML target and noise variance estimation under a diffuse noise model, and multichannel Wiener filtering
-
- Due to the specific construction of the dataset, at least four strategies may be employed to process the domestic environment mixtures: - - * 1. process each mixture (= 1 isolated sentence) alone - * 2. process all mixtures with the same SNR (= 4 successive sentences without silence) together - * 3. process the whole 5 min recording without knowledge of the sentence positions - * 4. process the whole 5 min recording using knowledge of the sentence positions - In any case, it is expected that the submitted signals correspond to the test mixtures (= isolated sentences). !!Submission @@ -Lines: 76-80 changed to +Lines: 70-74 @@
Each participant should then send an email to "onono (at) nii.ac.jp" and "zbynek.koldovsky (at) tul.cz" providing:
* contact information (name, affiliation)
- * basic information about his/her algorithm, including the __employed processing strategy__ among the four strategies outlined above, its average running time (in seconds per test excerpt and per GHz of CPU) and a bibliographical reference if possible
+ * basic information about his/her algorithm, including its average running time (in seconds per test excerpt and per GHz of CPU) and a bibliographical reference if possible
* the URL of the tarball
|