Question 1

What hardware do I need for a self-hosted LLM?

Accepted Answer

Depends on model size and concurrency. A 7B-parameter quantized model runs acceptably on a single A100 or M-series Mac mini for low-volume use. We size hardware in the discovery phase.

Question 2

Which open-source models do you work with?

Accepted Answer

Llama, Qwen, Mistral, DeepSeek, Gemma — via Ollama, vLLM, or llama.cpp. We pick based on your task, latency budget, and license constraints.

Question 3

Who maintains the deployment after handoff?

Accepted Answer

Your team can, with our runbook and ongoing retainer if useful. We don't lock you into needing us forever.

Keep your AI workload — and your data — inside your perimeter.

Right call when…

Questions people ask about Local AI / Self-hosted LLM

Is self-hosted AI the right call?