User:Password:

History: Two-channel mixtures of speech and real-world background noise

Comparing version 112 with version 123

@@ -Lines: 5-8 changed to +Lines: 5-12 @@

!!Introduction
This task aims to evaluate denoising and DOA estimation techniques by the [http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise|SiSEC 2010 noisy speech dataset].

+
+ !! Results
+ *Results for the [http://www.onn.nii.ac.jp/sisec13/evaluation_result/BGN/homepage_BGN_dev.html|development dataset]
+ *Results for the [http://www.onn.nii.ac.jp/sisec13/evaluation_result/BGN/homepage_BGN_test.html|test dataset]

!!Description of the dataset

@@ -Lines: 17-26 changed to +Lines: 21-30 @@

* -+Sq2+-: square 2 (another square than Sq1)
and in two different positions within each environment:

- * -+Ce+-: center (except in -+Su1+- and -+Su2+-)
* -+Co+-: corner

+ * -+Ce+-: center
* -+Co+-: corner (except in -+Su1+- and -+Su2+-)

Two recordings identified by a letter (A or B) were made in each case. Mixtures were then generated by adding a speech signal to the background noise signal. For the reverberant environments -+Su+- and -+Ca+-, the speech signals were recorded in an office room using the same microphone pair. For the outdoor environment -+Sq+-, the speech signals were mixed anechoically through simulation. The distance between the sound source and the array centroid was 1.0 m for female speech and 0.8 m for male speech. The direction of arrival (DOA) of the speech source was different in each mixture and the signal-to-noise ratio (SNR) was drawn randomly between -17 and +12 dB.

!!Test data

- __Download the [http://www.irisa.fr/metiss/SiSEC11/noise/test.zip|test set] (13 MB)__

+ __Download the [http://corpus-search.nii.ac.jp/sisec/2013/noise/test.zip|test set] (13 MB)__

The data consist of 20 stereo WAV audio files that can be imported in Matlab using the wavread command. These files are named -+test_<env>_<cond>_<take>_mix.wav+-, where

@@ -Lines: 29-35 changed to +Lines: 33-39 @@

* -+<take>+-: take ( -+A+-, -+B+-)
!!Development data

- __Download the [http://www.irisa.fr/metiss/SiSEC11/noise/dev.zip|development set] (24 MB)__

+ __Download the [http://corpus-search.nii.ac.jp/sisec/2013/noise/dev.zip|development set] (11 MB)__

- The data consists of 40 WAV audio files and 10 text files. These files are named as follows:

+ The data consists of 36 WAV audio files and 10 text files. These files are named as follows:

* -+dev_<env>_<cond>_<take>_src.wav+-: single-channel speech signal
* -+dev_<env>_<cond>_<take>_sim.wav+-: two-channel spatial image of the speech source

@@ -Lines: 41-47 changed to +Lines: 45-49 @@

* -+<cond>+-: recording condition ( -+Ce+-, -+Co+-)
* -+<take>+-: take ( -+A+-, -+B+-)

- Since the source DOAs were measured geometrically in the -+Su1+- and -+Ca1+- environments, they might contain a measurement error up to a few degrees; on the contrary, there is no such error in the -+Sq+- environment, because the spatial images of the speech source were simulated. />
The mixtures dev_Ca1_Co_A_mix.wav and dev_Ca1_Co_B_mix.wav are identical (this is a mistake that will be corrected in future evaluations).

+ Since the source DOAs were measured geometrically in the -+Su1+- and -+Ca1+- environments, they might contain a measurement error up to a few degrees; on the contrary, there is no such error in the -+Sq+- environment, because the spatial images of the speech source were simulated. The -+Co+- condition of the -+Ca1+- environment has take -+A+- only.

!!Tasks and reference software

@@ -Lines: 55-66 changed to +Lines: 57-60 @@

* [http://sisec2008.wiki.irisa.fr/tiki-download_file.php?fileId=9|istft_multi.m]: multichannel inverse STFT
* [http://sisec2011.wiki.irisa.fr/tiki-download_file.php?fileId=3|example_denoising.m]: TDOA estimation by GCC-PHATmax, ML target and noise variance estimation under a diffuse noise model, and multichannel Wiener filtering

-
- Due to the specific construction of the dataset, at least four strategies may be employed to process the domestic environment mixtures:
-
- * 1. process each mixture (= 1 isolated sentence) alone
- * 2. process all mixtures with the same SNR (= 4 successive sentences without silence) together
- * 3. process the whole 5 min recording without knowledge of the sentence positions
- * 4. process the whole 5 min recording using knowledge of the sentence positions
- In any case, it is expected that the submitted signals correspond to the test mixtures (= isolated sentences).

!!Submission

@@ -Lines: 76-80 changed to +Lines: 70-74 @@

Each participant should then send an email to "onono (at) nii.ac.jp" and "zbynek.koldovsky (at) tul.cz" providing:
* contact information (name, affiliation)

- * basic information about his/her algorithm, including the __employed processing strategy__ among the four strategies outlined above, its average running time (in seconds per test excerpt and per GHz of CPU) and a bibliographical reference if possible

+ * basic information about his/her algorithm, including its average running time (in seconds per test excerpt and per GHz of CPU) and a bibliographical reference if possible

* the URL of the tarball

History

Legend: v=view, c=compare, d=diff

Date	User	Version	Action
Tue 30 of July, 2013 04:45 CEST	admin	123 Current	v
Tue 30 of July, 2013 04:39 CEST	admin	122	v c d
Mon 01 of July, 2013 11:59 CEST	admin	121	v c d
Wed 06 of Mar., 2013 01:51 CET	admin	120	v c d
Wed 06 of Mar., 2013 01:51 CET	admin	119	v c d
Wed 06 of Mar., 2013 01:50 CET	admin	118	v c d
Wed 06 of Mar., 2013 01:49 CET	admin	117	v c d
Mon 25 of Feb., 2013 09:40 CET	admin	116	v c d
Mon 25 of Feb., 2013 09:38 CET	admin	115	v c d
Fri 22 of Feb., 2013 13:53 CET	admin	114	v c d
Fri 22 of Feb., 2013 13:52 CET	admin	113	v c d
Fri 22 of Feb., 2013 13:52 CET	admin	112	v c d

History: Two-channel mixtures of speech and real-world background noise

Comparing version 112 with version 123

History

Sidebar

Menu

Sidebar

Google Search