Improved Disentangled Speech Representations Using Contrastive Learning in Factorized Hierarchical Variational Autoencoder
Yuying Xie, Thomas Arildsen, Zheng-Hua Tan

Audio Demos Based on TIMIT

proposed - the proposed conversion algorithm

FHVAE - the original FHVAE results

Source Speaker / Speech Target Speaker / Speech Conversion

MWEW0_SI1361 MTLS0_SI1370 Proposed

FHVAE

MJMP0_SI1535 FMGD0_SI1564 Proposed

FHVAE

FDHC0_SI1559 FNLP0_SI1308 Proposed

FHVAE

FPAS0_SI1272 MTAS1_SI1473 Proposed

FHVAE