L3DAS23 - Submission

Submission platform

Results and papers can be submitted via the appropriate Google Form, where the following steps must be made:

  1. Insert the Team name, as indicated in the registration form.
  2. Insert the email address of the Team captain, as indicated in the registration form.
  3. Insert the email addresses of the other team members.
  4. Select the tasks and tracks you are partecipating to.
  5. Upload the 2-page report in a PDF format.
  6. Upload a .zip file for the results of the audio-only track, if any (read below).
  7. Upload a .zip file for the results of the audio-visual track, if any (read below).
  8. A text file describing the environmental impact of the proposed models in terms of CO2 produced (read below)
  9. (Optional) Upload a .zip file containing your code.
  10. (Optional) Upload an additional PDF file containing further details about the implemented method
  11. You can edit the submission before the challenge deadline (Feb 19, 11:59 pm AoE Time) by saving the link shown after the first submission. If you lost the link, you can proceed with a brand-new submission: each new submission will replace the one(s) previously made.

You should receive a confirmation email after the first submission.

Note: You may be required to log in with a Google account to fill out the form. Any Google account is allowed, even if not connected to any team participant (we will rely only on the information submitted within the form). Please read below for an alternative method to use in case of geographic restrictions.

AoE time now:

How to format the 2-page report

Each participating team must submit a 2-page report describing their work. While not strictly necessary, we strongly recommend structuring the report as a 2-page paper following ICASSP guidelines. Indeed, the top 5 teams will be invited to submit a 2-page paper to ICASSP 2023, with a strict deadline of Feb. 20, 2023. 

Note that a 2-page paper at ICASSP 2023 should follow the guidelines below:

  1. Use the same template for regular ICASSP papers, but with a page length of max 2 pages (instead of 4+1). This includes everything (title, abstract, intro, references, and possible figures and tables).
  2. The abstract and introduction must clearly mention that this work is done in the context of an “ICASSP Signal Processing Grand Challenge“ ( + include (a) official challenge name, (b) the year of the challenge, and if applicable (c) the edition number of the challenge)
  3. The paper is free format, but must not exceed 2 pages (including references), and should contain an abstract and (brief) introduction.
  4. The introduction should at least contain:
    • a brief description of the scope of the challenge (+ challenge name, see above)
    • a brief description of your proposed solution (its main ingredients)
    • the quantitative results you obtained on the challenge’s evaluation metrics.
  5. It is not necessary to exhaustively describe prior art in the introduction. However, standard citation rules remain applicable. E.g. if your solution is inspired by (or uses) existing work, cite it properly.
  6. In the main text, focus on the conceptual implementation/innovation, and a high-level description of the proposed solution.
  7. In general: make the paper as self-contained as possible. Yet, keep in mind that you still have the opportunity to submit a full journal paper to OJSP if you feel your methodology is sufficiently innovative to be published as a journal paper.

How to format the results for the submission

All participants must submit the only results obtained for the blind test, which was released on Feb 05, 2023. The submission must contain up to two zip archives (max 10 GB each), one for the audio-only track and one for the audio-visual track, enclosing two separate folders for the challenge tasks, named task1 and task2 (note: if your team competes in only one of the two tasks, your zip archive needs just to contain either the task1 or task2 folder. Similarly if you compete to only one of the two tracks, you should send only one zip archive). Each of these two folders should also contain two subfolders: 1mic, containing the results obtained with a one-mic configuration, and 2mic, containing the results obtained with a 2-mic configuration. Given multiple requests from participants, we will also accept submissions that contain only one of two subfolders (1mic or 2mic).

The zip archives should be titled {group_name}_{audiovisual/audioonly}.zip.

As an example, a team named "DnD" that wants to participate in the audio-visual track of Task 1 and in the audio-only track of both Task 1 and Task 2 will submit two ZIP archive files titled DnD_audiovisual.zip and DnD_audioonly.zip, having the following structure:

  • DnD_audiovisual
    • task1
      • 1mic
        •     [result_files]
      • 2mic
        •     [result_files]
  • DnD_audioonly
    • task1
      • 1mic
        •     [result_files]
      • 2mic
        •     [result_files]
    • task2
      • 1mic
        •     [result_files]
      • 2mic
        •     [result_files]

Each task folder should contain the results obtained for each data-point as individual files that must have the same name of the predictors files of the blind test set. Besides the naming, the format and the content of the result files differ for the two tasks.

Task 1

For Task 1 the models must predict mono-aural sound waveforms, containing the enhanced speech signals extracted from the multichannel noisy mixtures. Each submitted file for this task should be a numpy mono-dimensional array (.npy file) enclosing the floating-point samples of the predicted speech waveform, with a sampling rate of 16kHz.

Task 2

For Task 2 the models are expected to predict the spatial coordinates and class of the sound events active in a multichannel audio mixture. Such information must be generated for each frame in a discrete temporal grid with 100-milliseconds non overlapping frames.
Each submitted file for this task should be a csv table listing, for every time frame, the class and spatial coordinates of each predicted sound event. Only time frames with active sounds should be included (do not include time frames with no sounds predicted). Please use the following format for each row of the table:

[time frame number (starting from 0)] [class] [ρ][θ][z]

where class should be a string containing the sound class name (with the same naming of the original dataset) and ρ, θ and z should be floats describing the corresponding spatial coordinates.

We provide functions for testing the validity of the submission files as part of our API, as detailed in our GitHub repository. Moreover, below you find an example of submission folder.

Example submission folder [ZIP]

Optional material

Optionally, you can choose to submit the code you used to get your results. This can help us speed up the authenticity check procedures, in case it is necessary. Note that the intellectual property (IP) is not transferred to the challenge organizers and you remain the author(s) of your work.

Moreover, in addition to the 2-page report, participants can also submit an additional PDF file containing further details about the implemented method. This additional material will not be considered by default, but it may be used to better assess the novelty of the proposed methods in case of ties between different teams.

Your CO2 impact

Training deep learning models is a highly impactful activity in terms of energy consumed and pollutants produced. Although not easy, we should all strive to optimize resources and training sessions.

All participants must prepare a written document in which they specify, for each DL model, an estimate of the training hours, the hardware used, and an estimate of the CO2 produced (computed with online tools, like ML CO2 Impact).

Finally, the document must contain an overall estimate of the number of training hours used in all the various phases of development (unsuccessfull experiements included).

Example: assume that a participant submitted a Model A for Task 1, which was trained with an Nvidia Tesla V100 for 72 hours, and a Model B for Task 2, trained with an Nvidia Quadro RTX 8000 for 100 hours. The same participant, in the various tests performed before getting the final version of models A and B, used approximately the V100 for 150 hours (222 h in total) and the RTX 8000 for 250 hours (350 h in total). Such a participant should include in the written document:

Model A (Task 1, Track audioonly): 72 h, NVIDIA Tesla V100 PCle, 7.6 kg CO2 eq.
Model B (Task 2, Track audiovisual): 100 h, NVIDIA RTX 8000, 9.15 kg CO2 eq.

Total use:
approx. 222 h, NVIDIA Tesla V100 PCle, 23.44 kg CO2 eq.
approx. 350 h, NVIDIA RTX 8000, 32.03 kg CO2 eq.

Source kg CO2 eq.: https://mlco2.github.io/

All these details will allow us to assess the overall impact of the L3DAS23 Challenge and implement strategies to offset it.

Important Note on Task 2

Please note that ρθ and z should adhere to the following guidelines:

  • ρ should be in the range [1.0, 3.0], representing the distance of the sound source from the vertical axis on which the microphone is placed.
  • θ should be in the range [0°, 360°), representing the angle between the sound source and front direction of the microphone, calculated counterclockwise. It follows that sounds played in front of the microphone must have θ = 0.
  • z should be in the range [-1.2, +0.8] and represents the height of the sound source relative to the height of the microphone.

More information on the criteria adopted can be found on the dataset description page
All participants are encouraged to check that the notation of the submitted results matches the one we have adopted, so as to avoid problems at the evaluation stage, and to make appropriate conversions where necessary.

Our team remains available for any clarifications.

Problems filling out the submission form

The submission form was made with Google Form and may require a Google account to fill out. Some participants may therefore have problems filling out the form due to geographical restrictions applied to Google services.

In this case, you can proceed with sending your results to our email address (l3das@uniroma1.it), specifying "[L3DAS] Challenge Results - Team XXX" as the subject, where XXX is your team name. 

The email must contain the details and files listed earlier on this page. For heavier files, the participant may use any storage service that allows for convenient sharing and sufficiently fast downloading and that it is available for access in the European Union. A member of the L3DAS team will download the material as soon as possible and give confirmation via email (please consider the time zone).

The email submission deadline coincides with the challenge deadline (Feb 19, 11:59 pm AoE Time).