Use /v1/chat/completions, /v1/embeddings, /v1/rerank, /v1/images/generations, and /v1/audio/*. Behind the curtain, RassyGPT routes to the right model lane for Ian's hardware.
| Model | Kind | Backend | Description |
|---|---|---|---|
rassy-smart | chat | smart | Smart default router that chooses chat, coder, secondary coder, or fast utility lanes. |
rassy-general | chat | general | General-purpose 30B-class Qwen3 instruct model on GPU 7, V100 32GB. |
rassy-coder | chat | coder | Coding specialist on GPUs 0 and 2, dual V100 16GB. |
rassy-coder-secondary | chat | coder_secondary | Secondary coder worker on the V100 12GB lane. |
rassy-fast | chat | fast | Fast utility model on the P40 24GB lane. |
rassy-embed | embeddings | embed | Embeddings service on GPU 5, P100 16GB. |
rassy-rerank | rerank | rerank | Reranker service on GPU 5, with embedding-cosine fallback. |
rassy-image | images | image | LocalAI image lane on the RTX 2080 Ti. |
rassy-audio | audio | audio | LocalAI STT/TTS lane on the RTX 2080 Ti. |