Skip to content

Add ECAPA2 to VoxCeleb#3039

Open
othman-istaiteh wants to merge 8 commits into
speechbrain:developfrom
othman-istaiteh:add-ecapa2
Open

Add ECAPA2 to VoxCeleb#3039
othman-istaiteh wants to merge 8 commits into
speechbrain:developfrom
othman-istaiteh:add-ecapa2

Conversation

@othman-istaiteh
Copy link
Copy Markdown

What does this PR do?

This PR implements the ECAPA2 model architecture and its corresponding training recipe for VoxCeleb.

Key Additions:

  • speechbrain/lobes/models/ECAPA2.py: Implementation of the ECAPA2 architecture and SubCenterClassifier.
  • speechbrain/nnet/losses.py: Added JeffreysLoss for embedding regularization.
  • VoxCeleb Recipe (recipes/VoxCeleb/SpeakerRec/):
    • Added train_ecapa2.yaml and verification_ecapa2.yaml.
    • Updated train_speaker_embeddings.py and speaker_verification_cosine.py to support the new model and pipeline requirements.
    • Handled backward compatibility natively; existing models (X-Vector, ResNet, ECAPA-TDNN) run without modification.

Testing & Validation:

  • Added ECAPA2 testing vectors to tests/recipes/VoxCeleb.csv.
  • Ran pytest tests to ensure existing functionality remains intact.
  • Passed all doctests.
  • Ran pre-commit run -a to verify strict code formatting and linting.

Performance:

Trained on VoxCeleb 1 + VoxCeleb 2:

  • VoxCeleb1-O: 0.60% EER (with s-norm) / 0.70% EER (without s-norm)

Trained on VoxCeleb 2 only (tested without s-norm):

  • VoxCeleb1-O: 0.79% EER
  • VoxCeleb1-E: 1.00% EER
  • VoxCeleb1-H: 1.76% EER

Fixes N/A

Breaking changes: None. Backward compatibility is maintained for existing VoxCeleb scripts.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@othman-istaiteh othman-istaiteh marked this pull request as ready for review March 4, 2026 18:39
Copy link
Copy Markdown
Collaborator

@TParcollet TParcollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thank you very much for this recipe! I won't be able to try it because we do not have the voxceleb data. Before finding someone to try it, could you please address the comments?

num_workers: !ref <num_workers>

# Functions
use_tacotron2_mel_spec: True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using this? Is there a reason for not using standard Mels?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just followed the same used in this recipe https://github.com/speechbrain/speechbrain/blob/develop/recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn_mel_spec.yaml

But I can change it to standard mels

Copy link
Copy Markdown
Collaborator

@TParcollet TParcollet May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is new, interesting, then ok. Could you try with the standard ones just to see the end result maybe?

@@ -0,0 +1,97 @@
# ################################
# Model: Speaker Verification Baseline for ECAPA2
# Acknowledgment: The source code is derived from the Kiwano toolkit.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add author name for tracking.

|-----------------|------------|------| -----|
| Xvector + PLDA | VoxCeleb 1,2 | 3.23% | https://www.dropbox.com/sh/ab1ma1lnmskedo8/AADsmgOLPdEjSF6wV3KyhNG1a?dl=0 |
| ECAPA-TDNN | VoxCeleb 1,2 | 0.80% | https://www.dropbox.com/sh/ab1ma1lnmskedo8/AADsmgOLPdEjSF6wV3KyhNG1a?dl=0 |
| ECAPA2 | VoxCeleb 1,2 | 0.60% | https://drive.google.com/drive/folders/1cpU5qpCVM30Ip8I85EPM33lsUYPa6S7q?usp=sharing |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please @Adel-Moumen can we upload this to dropbox?

with torch.no_grad():
feats = params["compute_features"](wavs)
if (
"use_tacotron2_mel_spec" in params
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a bit confusing. See above question, i'd prefer if we could use standard Mel. Is there a real difference?



class SubCenterClassifier(nn.Module):
"""Sub-Center ArcFace Classifier.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring isn't explicit enough. I don't know what this is.



class ECAPA2Res2NetConv1d(nn.Module):
"""Res2Net convolutional block for 1D features."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same



class ECAPA2TDNNBlock(nn.Module):
"""TDNN block for ECAPA2."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same



class ECAPA2DenseBlock(nn.Module):
"""Dense convolutional block for ECAPA2."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same



class ECAPA2AttentiveStatPoolingBlock(nn.Module):
"""Attentive Statistics Pooling for ECAPA2."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same



class JeffreysLoss(nn.Module):
"""Computes the Jeffreys Loss, a combination of Cross Entropy, Label Smoothing,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a unit test for this new loss please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants