Stream maintains Vision Agents—an open-source Python framework for voice- and vision-driven agents with realtime media over Stream’s WebRTC edge. Cartesia is supported as the TTS provider; install steps, environment variables, and parameters are in Stream’s Cartesia integration.You need a Stream developer account for realtime transport and a Cartesia API key for speech.The “Simple Agent” example in GitHub and the voice / video intros are good starting points.