GLM TTS
Z.ai
Z.ai·Audio
LLM-based text-to-speech with zero-shot voice cloning from 3-10s of audio and emotion-expressive, controllable output via multi-reward RL.