Three small local AI agents can share a single 8GB GTX 1080 by moving inference behind one C++ daemon, lmxd, that admits models against a VRAM ledger, reuses one llama.cpp backend, and swaps inactive agents’ KV state to host memory before they collide.
That is the whole story in one sentence...