RassyGPT 1.0.0

One local OpenAI-style gateway. Many GPU engines.

Use /v1/chat/completions, /v1/embeddings, /v1/rerank, /v1/images/generations, and /v1/audio/*. Behind the curtain, RassyGPT routes to the right model lane for Ian's hardware.

Base URL
/v1
Auth
Authorization: Bearer <RASSYGPT_API_KEY>
ModelKindBackendDescription
rassy-smartchatsmartSmart default router that chooses chat, coder, secondary coder, or fast utility lanes.
rassy-generalchatgeneralGeneral-purpose 30B-class Qwen3 instruct model on GPU 7, V100 32GB.
rassy-coderchatcoderCoding specialist on GPUs 0 and 2, dual V100 16GB.
rassy-coder-secondarychatcoder_secondarySecondary coder worker on the V100 12GB lane.
rassy-fastchatfastFast utility model on the P40 24GB lane.
rassy-embedembeddingsembedEmbeddings service on GPU 5, P100 16GB.
rassy-rerankrerankrerankReranker service on GPU 5, with embedding-cosine fallback.
rassy-imageimagesimageLocalAI image lane on the RTX 2080 Ti.
rassy-audioaudioaudioLocalAI STT/TTS lane on the RTX 2080 Ti.