moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Why use it

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Best for

media

Deployment options

compose · docker

Resource requirements

server

Alternative to

No mapping yet

Common setup stack

Reverse proxy · HTTPS certs · auth gateway · backups

abogen

Compare with moshi

afilmory

Compare with moshi

ai-collection

Compare with moshi

airsonic

Compare with moshi