# OnPremize

> On-prem AI for enterprise codebases
> Detailed version: [llms-full.txt](https://onpremize.com/llms-full.txt)
> Last-Updated: 2026-06-05T21:49:17.884Z

OnPremize deploys AI code intelligence inside your network. It combines retrieval-augmented generation (RAG) over your codebase with LoRA fine-tuning so models learn your architecture, patterns, and conventions.

## Capabilities

- **Code-Aware RAG**: Hybrid dense + sparse search over indexed repositories. Returns contextually relevant code snippets for AI-assisted Q&A.
- **LoRA Fine-Tuning**: Train lightweight adapters on your codebase. Supports Qwen, StarCoder2, and other model families via a port-and-adapter architecture.
- **OpenAI-Compatible API**: Drop-in replacement for existing tooling. Supports both OpenAI and Anthropic API formats.
- **Deployment Modes**: On-premise bare metal, Kubernetes, VPC, and fully air-gapped environments. No data leaves your network.
- **Agent Workflows**: Multi-step tool-calling traces and architecture Q&A dataset generation for continuous model improvement.

## Deployment

Runs on Linux and Kubernetes. Requires a GPU for inference and training. Uses Qdrant for vector storage, BGE-M3 for embeddings, and supports 4-bit quantization for reduced GPU memory.

## Primary Sources

- [Air-Gapped AI Code Assistant](https://onpremize.com/solutions/air-gapped-ai-code-assistant)
- [On-Prem RAG for Source Code](https://onpremize.com/solutions/on-prem-rag-for-source-code)
- [LoRA Fine-Tuning for Private Code](https://onpremize.com/platform/lora-fine-tuning-for-private-code)
- [On-Prem AI Governance](https://onpremize.com/security/on-prem-ai-governance)
- [Machine-Readable Entity Facts](https://onpremize.com/ai-entity.json)

## Pages

- [Home](https://onpremize.com)
- [Privacy Policy](https://onpremize.com/privacy)
- [Terms of Service](https://onpremize.com/terms)
- [Brand Kit](https://onpremize.com/brand)

## Contact

- General: ping@onpremize.com
- Sales: ping@onpremize.com
- Security: ping@onpremize.com