History: Two-channel noisy recordings of a moving speaker within a limited area

Comparing version 10 with version 21


@@ -Lines: 1-3 changed to +Lines: 1-11 @@
!::Two-channel noisy recordings of a moving speaker within a limited area::
+
+ !! Motivation
+ This task is focused on a natural situation when the target is an uttering person whose location is limited to a specific area. For example, the target could be a speaker who is seated in a meeting (noisy) room. Its position is distant from microphones (say more than 1 meter) and is changing due to small movements of the speaker's head. The goal is to remove typical noise (e.g. babble noise) from the recorded speech. We assume that two microphones are available.
+
+ For such situation, a priori information may be provided in the form of noise-free recordings of the target from several (fixed) positions within the assumed area. For example, such recordings could be obtained during speaker-only intervals. How effectively can we exploit the a priori knowledge to enhance recordings of the speaker when the noise is present and his/her position is not perfectly known and could be changing within the limited area?
+
+ !! Results
+ The results are available [http://www.onn.nii.ac.jp/sisec13/evaluation_result/MOV/MOV2013.htm|here]

!! Scenario

@@ -Lines: 13-114 changed to +Lines: 21-53 @@

Next, there are four recordings during that the loudspeaker was moved over four positions. A video of the first recording is available for illustration [http://itakura.ite.tul.cz/zbynek/dwnld/sisec2013/video.zip|here] . The file names have the format dev_<set>_<positions>_{sim,src,noi,mix}.wav, where <set> is the index of the recording (A, B,C, or D), <positions> contains indices of four positions passed during the movement, and {sim,src,noi,mix} denote, respectively, target source images, source signal of the target, noise, and the noisy recording (sim+noi).
+
!! Test dataset
+ __Download [http://itakura.ite.tul.cz/zbynek/dwnld/sisec2013/test16.zip|test16.zip] (3 MB) __ (Test dataset, 16 kHz, 16 bits)
+ __Download [http://itakura.ite.tul.cz/zbynek/dwnld/sisec2013/test44.1.zip|test44.1.zip] (12 MB) __ (Test dataset, 44.1 kHz, 24 bits)
+ The dataset contains five noisy recordings of the moving loudspeaker within the area. The file names have the format test_<set>_x_x_x_x_mix.wav, where <set> is the index of the recording (A, B,C, D, or E). Here, the trajectory of the movement is not revealed.
- The data consist of stereo WAV audio files, that can be imported in Matlab using the wavread command. These files are named {dev1,dev2}__~np~[~/np~<author>~np~]~/np~-~np~[~/np~<song>~np~]__[~/np~<snip>~np~]__~/np~{mix,full_mix,<track>}.wav, where <author> is the author name, <song> is the song name, <snip> is a shortcut for snip information, and <track> is the separated track name (e.g., "vocals", "bass", etc.).
-
- The data include the following mixtures (snips and full-length recordings):
-
- dev1
- * ~np~dev1__bearlin-roads__snip_85_99__mix.wav~/np~
- * ~np~dev1__tamy-que_pena_tanto_faz__snip_6_19__mix.wav~/np~
- dev2
- * ~np~dev2__another_dreamer-the_ones_we_love__snip_69_94__mix.wav~/np~
- * ~np~dev2__fort_minor-remember_the_name__snip_54_78__mix.wav~/np~
- * ~np~dev2__ultimate_nz_tour__snip_43_61__mix.wav~/np~
- dev2_full_mix
- * ~np~dev2__another_dreamer-the_ones_we_love__full_mix.wav~/np~
- * ~np~dev2__fort_minor-remember_the_name__full_mix.wav~/np~
- * ~np~dev2__ultimate_nz_tour__full_mix.wav~/np~
-
- Separated tracks (needed for evaluation in dev1 and dev2) are in the corresponding folders named ~np~{dev1,dev2}__[<author>]-[<song>]__<snip>__<track>.wav~/np~.
- !! License
- All audio files are distributed under the terms different licenses, as listed below for each recodring:
- * Tamy - Que Pena Tanto Faz: [http://creativecommons.org/licenses/by-nc/3.0/|Creative Commons Attribution Noncommercial (3.0)]
- * Bearlin - Roads: [http://www.irisa.fr/metiss/SiSEC10/SiSEC_professional/bearlin-roads__license.txt|Read License]
- * Glen Philips - The Spirit of Shackleton [http://creativecommons.org/licenses/by-nc/3.0/|Creative Commons Attribution 3.0]
- * Nine Inch Nails - The Good Soldier [http://www.irisa.fr/metiss/SiSEC10/SiSEC_professional/nine_inch_nails-the_good_soldier__license.txt|Read License]
- * Shannon Hurley - Sunrise [http://creativecommons.org/licenses/by-nc/3.0/|Creative Commons Attribution-NonCommercial 3.0]
- * Another Dreamer - The Ones We Love [http://creativecommons.org/licenses/by-nc/1.0/|Creative Commons Attribution-NonCommercial 1.0]
- * Fort Minor - Remember the Name [http://creativecommons.org/licenses/by-nc/2.5/|Creative Commons Attribution-NonCommercial 2.5]
- * Ultimate NZ Tour [http://creativecommons.org/licenses/by-nc/3.0/|Creative Commons Attribution-Noncommercial-ShareAlike 3.0]
- * Jims Big Ego – Mix tape [http://creativecommons.org/licenses/by-sa/1.0/|Creative Commons Attribution-ShareAlike 1.0]
- * Vieux Farka Touré – Ana [http://creativecommons.org/licenses/by-nc/2.5/|Creative Commons Attribution-NonCommercial 2.5]
-
- The data were taken from the [http://www.mtg.upf.edu/static/mass/resources|MTG MASS database] and from the [http://www.tsi.telecom-paristech.fr/aao/en/2012/03/12/quasi/|QUASI database].
!! Tasks
- The following should be taken in to account:
*
The participants are encouraged to separate only the snips of songs in case of test1, test2, dev1, and dev2. In case of test3, the participants are encouraged to separate both snips and full-length recordings.
* Some track names below have the following meaning: />** &quot;vocals&quot; = &quot;a sum of any singing including main vocal, back vocals and singing in the reverb"
** "drum
s" = &quot;a sum of any drums including bass drum, hi-hat, snare etc.&quot;r />** "bass" = "bass guitar only (i.e., not bass drum)"
+ The participants are encouraged to submit />* Enhanced (de-noised) testing as well as development noisy recordings
* Estimated trajectories of the loudspeaker in terms of sequences of indices of positions (mandatory)
+ !!Submissions
+ Each participant should make his/her results available online in the form of a tarball called <YourName>_<dataset>.zip.
+ The files containing the enhanced utterances should be named: <dataset>_<set>_x_x_x_x_enh.wav
+ where <dataset> is either dev or test, <set> is A, B, C, D, or E, and x_x_x_x are the estimated positions of the target during the movement.
- !!! Tracks to separate (test tasks)
__~np~test1__tamy-que_pena_tanto_faz__snip__mix.wav~/np~__
* vocals, guitar
__~np~test1__bearlin-roads__snip__mix.wav~/np~__
* vocals, bass, drums, piano
__~np~test2__glen_philips-the_spirit_of_shackleton__snip_163_185__mix.wav~/np~__
* vocals, drums, bass, other
__~np~test2__nine_inch_nails-the_good_soldier__snip_104_125__mix.wav~/np~__
* bass, drums, vocals, other
__~np~test2__shannon_hurley-sunrise__snip_62_85__mix.wav~/np~__
* vocals, drums, bass, piano
__~np~test3__jims_big_ego-mix_tape__{snip, full_mix}__mix.wav~/np~__
* vocals, drums, bass, other
__~np~test3__vieux_farka_toure-ana__{snip, full_mix}__mix~/np~__
* vocals, drums, bass, other
!!! Tracks to separate (development tasks)
__~np~dev2__another_dreamer-the_ones_we_love__snip_69_94__mix.wav~/np~__
* vocals, drums, guitar
__~np~dev2__fort_minor-remember_the_name__snip_54_78__mix.wav~/np~__
* vocals, drums, bass, claps
__~np~dev2__ultimate_nz_tour__snip_43_61__mix.wav~/np~__
* vocals, drums, bass
!! Submission
Participants may submit separation results for any above-mentioned tracks of the test and development mixtures.
In addition, each participant is asked to provide basic information about his/her algorithm (e.g. a bibliographical reference) and to declare its average running time, expressed in seconds per test excerpt and per GHz of CPU.

!!!How to submit
Each participant should make his results available online in the form of a tarball called <YourName>_<dataset>.zip.
The included files must be named as follows:
<dataset>__<author>-<song>__<snip or full_mix>__<trackname>.wav
where <dataset> is one of the test/test2/dev2, <filename> is a shortcut for the set of source signals, <trackname> is the name of the extracted track. For example, the estimated vocal track for the task file "test2_glen_philips-the_spirit_of_shackleton_snip_163_185_mix.wav" should be named as "test2__glen_philips-the_spirit_of_shackleton__snip_163_185__vocals.wav".

Each participant should then send an email to "zbynek.koldovsky (at) tul.cz" and to "onono (at) nii.ac.jp" providing:
+ Each participant should then send an email to "zbynek.koldovsky (at) tul.cz" providing:
*contact information (name, affiliation)
*basic information about his/her algorithm, including its average running time (in seconds per test excerpt and per GHz of CPU) and a bibliographical reference if possible
*the URL of the tarball(s)
- The submitted audio files will be made available on a website under the terms of the same license as indicated in the section Licenses above. In other words, any modified version inherit exactly the same license as the original.
!! Evaluation criteria
The evaluation will be done through the perceptual evaluation toolkit [http://bass-db.gforge.inria.fr/peass/|PEASS v.2.0].
- !! Potential participants
* M Vinyes Rasor />* Vasileios Pantazis<br />* Alexey Ozerov (alexey.ozerov (a) irisa_fr) />* Jeanlouis Durrieu (durrieu (a) enst_fr)
/>* Maximo Cobos (mcobos (a) iteam_upv_es)
/>* Pablo Cancela (pcancela (a) gmail.com)
*
Antoine Liutkus (antoine.liutkus (a) telecom-paristech.fr) />* Pierre Leveau (pierre.leveau (a) audionamix.com)<br />* Jordi Janer (jordi.janer (a) upf.edu) />* Nobutaka Ono (onono (a) nii.ac.jp)r />* Stanislaw Gorlow />Task proposed by Audio Committee
+ !!Licensing issues
All files are distributed under the terms of the [http://creativecommons.org/licenses/by-nc/3.0/|Creative Commons Attribution-Noncommercial-ShareAlike 3.0] license. The files to be submitted by participants will be made available on a website under the terms of the same license.
- [./tiki-index.php?page=Audio+source+separation|Back to Audio source separation top]
+ The recordings are authored by Emmanuel Vincent, Zbynek Koldovsky, and Jiri Malek.
+ [./tiki-index.php?page=Audio+source+separation|Back to Audio source separation top]

History

Legend: v=view, c=compare, d=diff
Date UserEdit Comment Version Action
Fri 26 of July, 2013 09:26 CEST admin   21
Current
 v
Fri 29 of Mar., 2013 10:57 CET admin   20  v  c  d  
Fri 29 of Mar., 2013 10:55 CET admin   19  v  c  d  
Fri 29 of Mar., 2013 10:54 CET admin   18  v  c  d  
Fri 29 of Mar., 2013 10:45 CET admin   17  v  c  d  
Wed 27 of Mar., 2013 10:51 CET admin   16  v  c  d  
Wed 27 of Mar., 2013 10:47 CET admin   15  v  c  d  
Wed 27 of Mar., 2013 10:45 CET admin   14  v  c  d  
Wed 27 of Mar., 2013 10:44 CET admin   13  v  c  d  
Wed 27 of Mar., 2013 10:37 CET admin   12  v  c  d  
Wed 27 of Mar., 2013 10:34 CET admin   11  v  c  d  
Wed 27 of Mar., 2013 10:32 CET admin   10  v  c  d  

Menu

Google Search

 
sisec2013.wiki.irisa.fr
WWW