L3DAS23 - Results

Task 1

3D Speech Enhancement in Simulated Reverberant Environment

In Task 1, the models are expected to extract the monophonic voice signal from the 3D mixture that contains various background noises. The evaluation T1 metric for this task is a combination of the short-time objective intelligibility (STOI), which estimates the intelligibility of the output speech signal, and word error rate (WER), computed to assess the effects of the enhancement for speech recognition purposes. The T1 metric lies in the range [0,1], where the higher the value, the better.

L3DAS23 Challenge results for Task 1 are depicted in the following interactive chart.

The table below shows the L3DAS23 Challenge rank for Task 1, including all the scores for the sake of comparison.

RankTeam NameWER STOI T1 Metric 
1SEU Speech0.1010.9020.901  Results obtained with a 2-mic configuration
2JLESS0.1740.8360.831  Results obtained with a 2-mic configuration
3CCA Speech0.2400.8310.796  Results obtained with a 2-mic configuration
-Baseline0.5670.6730.553        
4SpeechLab4100.6430.6080.483 Results obtained with a 2-mic configuration

Task 2

3D Sound Event Localization and Detection in Simulated Reverberant Environments

In task 2, the models are expected to predict a list of the active sound events and their respective location at regular intervals of 100 milliseconds. The evaluation T2 metric is a location-sensitive detection error computed on each time frame. It consists of measuring the Cartesian distance between the predicted and true events, and then computing the F score. The T2 metric lies in the range [0,1], where the higher the value, the better.

L3DAS23 Challenge results for Task 2 are depicted in the following interactive chart.

The table below shows the L3DAS23 Challenge rank for Task 2, including all the scores, for the sake of comparison.

Rank*Team NamePrecision Recall T2 Metric 
1JLESS0.2880.2040.239  Results obtained with a 2-mic configuration
2NERCSLIP-USTC0.2750.2160.242  Results obtained with a 2-mic configuration
-Baseline0.1820.1400.158  Results obtained with a 2-mic configuration

* The quality of the results produced, as well as the T2 metric, was considered to define the ranking order.

The list of results organized by Task, Track, and 1-mic and 2-mic configuration are available at this link.

Benefits

Based on the challenge results the following benefits are assigned.

Papers at ICASSP 2023

The first 5 ranked teams are allowed to submit a paper to ICASSP 2023. Given the number of submissions to the two tasks, we accept papers from all the teams, as listed in the following table. Papers will undergo a regular peer-review process. The format should be consistent with ICASSP 2-page paper. The deadline for the paper submission is February 20 at 11:59 p.m. AoE (strict deadline).

Ranking PositionTaskTeam
1st1: 3D SESEU Speech
2nd1: 3D SE + 2: 3D SELDJLESS
3rd1: 3D SECCA Speech
4th2: 3D SELDNERCSLIP-USTC
5th1: 3D SESpeechLab410

CO2 impact

The following table shows the training hours and related CO2 production in kg eq., as reported by participants during submission.

Team NameCO2 Best ModelTraining hoursTotal CO2Total Training hours
SEU Speech14.11 kg CO2 eq.90 h14.11 kg CO2 eq.approx. 90 h
CCA Speech11.19 kg CO2 eq.74 h22.9  kg CO2 eq.approx. 180 h
SpeechLab41012.20 kg CO2 eq.121 h13.96  kg CO2 eq.approx. 139 h
NERCSLIP-USTC0.54 kg CO2 eq.5 h16.74  kg CO2 eq.approx. 155 h
JLESS12.20 kg CO2 eq.121 h18.15  kg CO2 eq.approx. 120 h

If you are unable to view the interactive charts due to your geographical location, click here to replace them with images.