Improved Disentangled Speech Representations Using Contrastive Learning in Factorized Hierarchical Variational Autoencoder
Yuying Xie, Thomas Arildsen, Zheng-Hua Tan
Audio Demos Based on TIMIT
- proposed - the proposed conversion algorithm
- FHVAE - the original FHVAE results
Source Speaker / Speech | Target Speaker / Speech | Conversion | |
---|---|---|---|
MWEW0_SI1361 | MTLS0_SI1370 | Proposed | |
FHVAE | |||
MJMP0_SI1535 | FMGD0_SI1564 | Proposed | |
FHVAE | |||
FDHC0_SI1559 | FNLP0_SI1308 | Proposed | |
FHVAE | |||
FPAS0_SI1272 | MTAS1_SI1473 | Proposed | |
FHVAE |