Improved Disentangled Speech Representations Using Contrastive Learning in Factorized Hierarchical Variational Autoencoder

Yuying Xie, Thomas Arildsen, Zheng-Hua Tan





Audio Demos Based on TIMIT

  • proposed - the proposed conversion algorithm
  • FHVAE - the original FHVAE results
Source Speaker / Speech Target Speaker / Speech Conversion
MWEW0_SI1361 MTLS0_SI1370 Proposed
FHVAE
MJMP0_SI1535 FMGD0_SI1564 Proposed
FHVAE
FDHC0_SI1559 FNLP0_SI1308 Proposed
FHVAE
FPAS0_SI1272 MTAS1_SI1473 Proposed
FHVAE