An AI streaming "buddy" like Neuro-sama

movo1020 · January 25, 2026, 11:20pm

After learning more about Vedal and the AI vtuber Neuro-sama, I wonder if it’s possible to create an AI “buddy” for new streamers to help them get used to talking while live? It would be a good project to add to my resume and I hope it’ll help other introverted streamers.
Some of the members in Vedal’s server told me it would involve different models working together and that Hugging Face is a good place to start looking. Are there any other factors I should consider? I’m a software developer/tester, but an AI novice.

John6666 · January 26, 2026, 8:11am

I’m not particularly knowledgeable about the actual Neuro-sama, but from what I’ve gathered by referencing threads and such, the difficulty in replication seems to lie more in the software engineering aspects than the AI models itself.

Since the models themselves have become more efficient over time, the core challenges are more on the software engineering side: things like pipeline structure, real-time performance, model integration, network processing, and leveraging existing components.

Regarding the AI models needed within the system, a basic understanding of LLMs for text, VAD, ASR, and TTS for audio should likely suffice for implementation. You might have to wrestle with Python or library version mismatches though…

Related threads

movo1020 · January 26, 2026, 1:37pm

Wow. Did you write those files just to answer my question? Either way, thanks for the valuable information.

John6666 · March 1, 2026, 12:10pm

It’s not like a streamer like Neuro-sama, but I happened to see a buddy-like agent: @NJX-njx on Hugging Face: "Recently, I have open-sourced an AI emotional companion product based on…"

movo1020 · March 3, 2026, 12:35am

This brings up something I was wondering: what is “self-hosting”? Is it like running software locally? Because that sounds ideal for my idea.

John6666 · March 3, 2026, 1:29am

what is “self-hosting”? Is it like running software locally?

Yeah. That understanding is over 90% correct. When using models hosted on Hugging Face, basically it’s possible.

However, powerful GPUs are expensive, bulky, and come with the hassle of heat, power consumption, and management. That’s why many people rent a dedicated VPS (provided by Amazon, Google, Hugging Face, etc.) to use instead of their local PC.

I think most people start with LMStudio or Ollama (fast, easy-to-use).

Self-hosting, with concrete “do-this” examples

Self-hosting = you run the AI services yourself and your app connects to them via a local/network URL like http://localhost:….

Example 1 — Local LLM server (simplest CLI): Ollama

What you get: a local HTTP API on your machine (default: localhost:11434). (Ollama Document)

How it looks (API call):

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Say hi like a friendly streaming buddy."
}'

This exact curl style is in Ollama’s API docs. (Ollama Document)

Notes (important for “self-hosted”):

Ollama’s API is served locally by default at http://localhost:11434/api. (Ollama Document)
It binds to 127.0.0.1 by default; you can change the bind address (for LAN access) with OLLAMA_HOST. (Ollama Document)
No auth is required for local access, so keep it on localhost unless you add your own protection. (Ollama Document)

Example 2 — Local LLM server (simplest GUI): LM Studio

What you get: click-to-run local server; also offers OpenAI-compatible endpoints (useful for many apps). (LM Studio)

Steps:

Open LM Studio → Developer tab
Toggle Start server (it runs on localhost) (LM Studio)
Your app connects to LM Studio via:
- LM Studio REST API, or
- OpenAI-compatible endpoints (LM Studio)

CLI alternative:

lms server start

(LM Studio)

Example 3 — “Self-hosted ChatGPT-like web UI” (local or home server): Open WebUI (Docker)

What you get: a browser UI you host yourself (accounts, chat UI). Can connect to a local Ollama instance. (Open WebUI)

Run it locally (Docker):

docker pull ghcr.io/open-webui/open-webui:main
docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Then open:

http://localhost:3000 (Open WebUI)

If your Ollama is on another machine (LAN/VPS):

Set OLLAMA_BASE_URL=… when starting Open WebUI. (Open WebUI)

Example 4 — Full “voice buddy” pipeline (still self-hosted)

4A) Local speech-to-text (STT): Whisper.cpp

What you get: local STT; some setups expose an OpenAI-compatible local endpoint like http://127.0.0.1:2022/v1. (Voice Mode)

4B) Local text-to-speech (TTS): Piper

What you get: fast local neural TTS. (GitHub)

4C) Local text-to-speech (TTS): Coqui TTS

What you get: a local TTS server you run yourself. Example command: (Amica)

python3 TTS/server/server.py --model_name tts_models/en/vctk/vits

Putting it together (conceptual flow):

Mic audio → (local STT) → text → (local LLM) → reply text → (local TTS) → speaker audio

Example 5 — Streaming integrations (local control + platform events)

Control OBS locally: obs-websocket

obs-websocket is built into OBS Studio 28+ and lets an external app control OBS via WebSocket. (GitHub)

Read stream events: Twitch EventSub WebSockets

Twitch’s EventSub WebSocket server is wss://eventsub.wss.twitch.tv/ws. (Twitch Developers)

Quick “which self-hosting approach should I pick?”

Goal	Best starting point
“Just run an LLM locally”	Ollama (CLI) (Ollama Document) or LM Studio (GUI) (LM Studio)
“I want a self-hosted web UI”	Open WebUI (Docker) (Open WebUI)
“I want a voice buddy”	Add Whisper.cpp (STT) (Voice Mode) + Piper/Coqui (TTS) (GitHub)
“I want it to drive my stream”	Add OBS WebSocket (GitHub) + Twitch EventSub (Twitch Developers)

movo1020 · March 5, 2026, 4:00pm

I don’t know why I was acting like other people haven’t done similar projects already. I’m sure I shouldn’t let that discourage me from continuing with my idea, but the wind kinda left my sails a bit…..

John6666 · March 6, 2026, 3:20am

Yeah. Well, let’s look at it positively—if you find a good framework for you, it’ll save you time. That means you can spend more time improving the functionality.

When I looked into it before, I thought there are several solutions beyond strictly focusing on Neuro-sama’s personality (topic management skills?) or behavior as a streamer.
Depending on what you’re looking for, it’s often the case that someone has already built a framework for it.

Decision matrix

Legend:
5 = strongest fit, 1 = weakest fit.
These scores are my synthesis from the projects’ current docs/READMEs: realtime voice coverage, avatar support, memory/character tooling, deployment style, and how much glue code is still needed. (GitHub)

Option	Closest to “AI streamer buddy” out of the box	Avatar/body included	Realtime voice strength	Character/memory strength	Local / self-host path	Managed / hosted path	How much custom engineering you still need	Best for
Open-LLM-VTuber	5	5	4	3	5	2	3	Closest open-source starting point
AIRI	4	5	4	4	5	1	4	Ambitious companion-style platform
ChatdollKit	4	5	4	3	4	1	3	Unity / VRM / 3D avatar builders
Pipecat	2	1	5	2	4	2	4	Python-first custom stacks
LiveKit Agents	2	1	5	2	4	4	4	Production-grade realtime backend
TEN Framework	3	2	5	3	4	2	4	Interruptible, full-duplex voice agents
Inworld Runtime	4	3	4	5	2	5	2	Managed character platform
Convai	4	3	4	4	2	5	2	Fast hosted character deployment
ElizaOS	2	1	1	5	4	2	4	Character brain / plugin layer

Fast picks

Choose Open-LLM-VTuber if:

You want the closest thing to a self-hosted, open-source “talking on-screen companion” without assembling the whole stack yourself.

Choose AIRI if:

You want the most ambitious open-source “digital being” direction and are comfortable with a more evolving project.

Choose ChatdollKit if:

Your mental model starts with “I want a 3D character in Unity” rather than “I want a voice backend.”

Choose Pipecat or LiveKit Agents if:

You want to engineer the system cleanly yourself and treat avatar/persona as separate layers.

Choose Convai or Inworld Runtime if:

You want the fastest hosted path to a believable voiced character.

Choose ElizaOS if:

You mostly need a character brain, plugin system, and orchestration layer, then plan to pair it with another realtime/avatar stack.

Background on each option

1) Open-LLM-VTuber

This is the most direct open-source match for a Neuro-like setup because it already combines hands-free voice chat, interruption handling, a Live2D talking face, swappable LLM/ASR/TTS backends, offline-capable deployment on macOS/Linux/Windows, and configurable long-term memory via MemGPT. That means you start with a system that already thinks in terms of a talking character, not just a voice pipeline. (GitHub)

Best fit: solo builders who want the shortest path to “AI buddy on screen.”
Main tradeoff: you still need your own stream/community/game connectors if your workflow goes beyond the core companion loop. (GitHub)

2) AIRI

AIRI is one of the strongest “digital life” style projects right now. Its README shows browser and Discord audio input, client-side speech recognition, browser-local inference, VRM support, Live2D support, and even directions like Minecraft/Factorio play and chat integrations. In other words, it is trying to be more than a voice bot: it is aiming at an ongoing embodied companion platform. (GitHub)

Best fit: builders who want a broad companion/agent world and do not mind a project that is still maturing.
Main tradeoff: more moving parts, more experimental surface area, more setup risk than a narrower framework. (GitHub)

3) ChatdollKit

ChatdollKit is the cleanest pick if the center of gravity is a 3D avatar in Unity. Its current README emphasizes 3D model expression, autonomous facial expression/animation control, lip-sync, STT/TTS integration, dialog-state management, wakeword support, multiple LLM providers, and deployment across Unity-supported platforms including WebGL, VR, and AR. (GitHub)

Best fit: VTuber/avatar creators, Unity developers, VRM users.
Main tradeoff: it is much more avatar-engine-centric than “general AI streaming framework” centric. (GitHub)

4) Pipecat

Pipecat is one of the best Python-first bases for building your own Neuro-like system from scratch. Its docs position it as an open-source framework for voice and multimodal AI bots that can see, hear, and speak in real time, with orchestration for AI services, transports, and audio pipelines. It is especially good when you want to prototype quickly in Python and keep control over the stack. (docs.pipecat.ai)

Best fit: Python builders who want flexibility and rapid iteration.
Main tradeoff: you still need to choose and build the avatar layer, memory policy, and streamer-specific integrations yourself. (docs.pipecat.ai)

5) LiveKit Agents

LiveKit Agents is excellent when your biggest problem is not “how do I make a character,” but “how do I make realtime voice feel solid.” Its docs focus on STT→LLM→TTS pipelines, reliable turn detection, interruption handling, provider plugins, and the core mechanics of production-grade voice AI. (docs.livekit.io)

Best fit: teams or advanced builders who want a robust realtime core.
Main tradeoff: it is infrastructure-first, not VTuber-first. You add the body, persona, and content loop yourself. (docs.livekit.io)

6) TEN Framework

TEN sits between bare realtime plumbing and higher-level character systems. Its repo describes an open-source framework for real-time multimodal conversational AI, with an ecosystem that includes agent examples, VAD, and turn detection for full-duplex dialogue. That makes it especially attractive if you care a lot about interruption, overlap, and natural back-and-forth speech behavior. (GitHub)

Best fit: builders who want natural conversational flow and are comfortable assembling pieces.
Main tradeoff: still more framework than finished companion product. (GitHub)

7) Inworld Runtime

Inworld Runtime is a strong managed character platform. Its current docs describe it as an orchestration platform for sophisticated AI characters and voice agents, carrying forward capabilities like knowledge retrieval, safety checks, long-term memory, and expressive voice synthesis from its earlier character tooling. (docs.inworld.ai)

Best fit: teams that want believable characters with less low-level assembly.
Main tradeoff: less self-owned, less local-first, and more platform-shaped than the open-source stacks above. (docs.inworld.ai)

8) Convai

Convai is one of the fastest hosted routes to interactive characters for web/game experiences. Its current Web SDK docs emphasize fast hands-free interaction with real-time audio, text, optional video, character actions, and emotion signals, while its memory docs describe persistent session memory support. (docs.convai.com)

Best fit: people who want a cloud platform that already “thinks in characters.”
Main tradeoff: less local control and more dependence on the vendor’s model of character building. (docs.convai.com)

9) ElizaOS

ElizaOS is best understood as a brain and plugin layer, not a full Neuro-like frontend stack. Its repo describes it as an extensible platform for building and deploying AI-powered applications and game NPCs, and its plugin registry supports dynamic plugin loading for integrations like Discord, browser use, PDF/image/video processing, and local model support. (GitHub)

Best fit: builders who want character files, plugins, and orchestration, then plan to pair that with LiveKit, Pipecat, ChatdollKit, or a custom frontend.
Main tradeoff: realtime voice/avatar embodiment is not its main “out of the box” value. (GitHub)

The simplest decision rule

Pick one from this list depending on your starting point:

Closest open-source “AI buddy on screen” → Open-LLM-VTuber (GitHub)
Most ambitious open-source companion project → AIRI (GitHub)
Best Unity / 3D avatar route → ChatdollKit (GitHub)
Best Python custom route → Pipecat (docs.pipecat.ai)
Best production realtime core → LiveKit Agents (docs.livekit.io)
Best if interruption/full-duplex matters most → TEN Framework (GitHub)
Best managed character platform → Inworld Runtime or Convai (docs.inworld.ai)
Best “character brain” to pair with something else → ElizaOS (GitHub)

My recommendation by user type

Beginner who wants a Neuro-like prototype fast:
Start with Open-LLM-VTuber. (GitHub)

Open-source enthusiast who wants a broader long-term project:
Look at AIRI. (GitHub)

Unity / VRM / avatar creator:
Use ChatdollKit. (GitHub)

Python engineer:
Use Pipecat as the voice core, then add your own avatar layer. (docs.pipecat.ai)

Infra-minded or production-minded engineer:
Use LiveKit Agents. (docs.livekit.io)

You care most about natural turn-taking and interruption:
Evaluate TEN Framework. (GitHub)

You want hosted convenience over deep ownership:
Use Convai or Inworld Runtime. (docs.inworld.ai)

You want a personality/plugin system to combine with another stack:
Use ElizaOS as the brain layer. (GitHub)

movo1020 · March 25, 2026, 3:10pm

I’ve gone down the Open-LLM-Vtuber path. I’ve forked the repo and installed the last release. During setup, I diverted in an attempt to use it and LM Studio. Now it’s not working. So I may go backwards and follow the instructions completely.

John6666 · March 25, 2026, 11:11pm

Oh, I see. LM Studio support feature was added just recently… In cases like this, it usually takes a while to stabilize…

Yes. Roll back to a known-good baseline first. Then re-introduce LM Studio as a single isolated change. That is the lowest-risk path here. Open-LLM-VTuber’s own docs warn that the project is still unstable and not easy to install, and the official quick start still uses Ollama + sherpa-onnx-asr + edge_tts as the baseline path. LM Studio support was added later in v1.2.0, so mixing setup steps too early is a common way to end up with a broken state. (GitHub)

The two most likely causes

First, your checkout may be incomplete. The frontend is a git submodule, so the docs require either the official release package or git clone --recursive. If the submodule is missing, the web UI can fail with {"detail":"Not Found"}, and the FAQ says to fix that with git submodule update --init --recursive. (Open LLM Vtuber)

Second, your config may be stale. Since v1.1.0, conf.yaml is generated from templates and updated through a config template system. The v1.2.0 release notes mention fixes for config update bugs and changed project_id / organization_id defaults to null to prevent API errors. There are also issue reports showing warnings like “user config contains keys not present in default config,” which is exactly the sort of thing that happens after hand edits or upgrades across config format changes. (Open LLM Vtuber)

Recovery plan

1. Stop trying to fix LM Studio inside the current broken install

Make a copy of your current conf.yaml, then stop using it for the moment. This project’s own update notes say config files can change and should be backed up cautiously during updates. (GitHub)

2. Verify the repo itself is complete

In your project directory, run:

git submodule update --init --recursive

If you originally cloned without --recursive, this is required. If you want the cleanest reset, re-clone from the official repo with --recursive or use the official release ZIP from the release page, not GitHub’s green “Code → Download ZIP.” The docs explicitly say not to use that ZIP because it omits the frontend/submodule state. (Open LLM Vtuber)

3. Reset to the official baseline

The docs recommend Python >= 3.10 and < 3.13, uv as the primary environment manager, and:

uv sync
uv run run_server.py

Running once will generate config in some cases, but the docs actually recommend copying config_templates/conf.default.yaml to conf.yaml instead of relying on auto-generation. They also note that if you exit too late on first run, model downloads may start and partially downloaded files under models/ can block clean startup later. (Open LLM Vtuber)

My recommendation is stricter than the docs here: use a fresh conf.yaml copied from the current template and treat your old file as reference only. That avoids stale keys.

4. Do not customize anything except the baseline LLM

For the first successful run, stay as close as possible to the quick start:

basic_memory_agent
ollama_llm
default ASR: sherpa-onnx-asr
default TTS: edge_tts
open the app in Chrome, because the docs call out known browser issues in Edge and Safari. (Open LLM Vtuber)

When it works, the backend should start and the web UI should be available at http://localhost:12393. Do not enable Letta yet. The agent docs say that if you switch to letta_agent, the LLM settings in conf.yaml stop being the effective source of truth and the actual model comes from the Letta server instead. (Open LLM Vtuber)

5. Only after the baseline works, swap Ollama for LM Studio

Open-LLM-VTuber’s LLM docs say LM Studio should be configured through openai_compatible_llm. The project treats most providers as wrappers over that same OpenAI-compatible format. (Open LLM Vtuber)

Use this minimal pattern:

agent_config:
  agent_settings:
    basic_memory_agent:
      llm_provider: 'openai_compatible_llm'

llm_configs:
  openai_compatible_llm:
    base_url: 'http://localhost:1234/v1'
    llm_api_key: 'somethingelse'
    organization_id: null
    project_id: null
    model: 'PUT_THE_EXACT_LM_STUDIO_MODEL_ID_HERE'
    temperature: 0.7

Why these fields:

Open-LLM-VTuber’s docs show openai_compatible_llm is the right backend for LM Studio and similar servers. (Open LLM Vtuber)
LM Studio’s docs say its OpenAI-compatible base URL should point to http://localhost:1234/v1 by default. (LM Studio)
Open-LLM-VTuber’s v1.2.0 release notes say organization_id and project_id were changed to null by default to prevent API errors. (Open LLM Vtuber)

6. Prove LM Studio itself works before involving Open-LLM-VTuber

Before starting Open-LLM-VTuber with LM Studio, confirm LM Studio independently:

In LM Studio, go to the Developer tab and start the local server, or run lms server start. (LM Studio)
Make sure the model is loaded into memory. Open-LLM-VTuber’s FAQ explicitly calls this out for LM Studio users. (Open LLM Vtuber)
Check the visible model list:

curl http://localhost:1234/v1/models

LM Studio’s docs say this endpoint returns the models visible to the server. Use the exact model identifier it returns in conf.yaml. (LM Studio)

What not to change yet

Do not touch Letta, MCP, custom ASR, custom TTS, or remote access until local baseline works. The project is in active transition, v2 is being planned, and v1 is still receiving bug fixes rather than becoming simpler. Keeping the first working state narrow matters here. (GitHub)

Also check two easy failure points from the FAQ:

a proxy that does not bypass localhost
two backends competing for port 12393 (Open LLM Vtuber)

The shortest practical sequence

git submodule update --init --recursive
replace conf.yaml with a fresh copy of config_templates/conf.default.yaml
uv sync
run the official quick-start baseline and confirm http://localhost:12393 works in Chrome
change only llm_provider and the openai_compatible_llm block to LM Studio
verify LM Studio server is running and curl http://localhost:1234/v1/models returns the exact model id you put in config (Open LLM Vtuber)

Topic		Replies	Views
Streamer AI (Like Neuro-Sama) Beginners	33	44794	March 22, 2026
Starting with AI and assistants Beginners	1	1158	March 22, 2025
Gradio chatbot side by side (chatbot arena) 🔒 Gradio	0	932	August 29, 2023
Help with my questions. very new at this Intermediate	9	78	February 20, 2026
Chat.kiconnect.nrw Custom endpoint from university Beginners	0	17	February 10, 2026