Private AI for your business — hosted in Europe
OpenAI-compatible endpoints, curated open-source models, and transparent credit-based billing. Pilot with Starter from 490€ / month and scale to Business and Enterprise — without sending data to US providers.
Pilot programs available — talk to us.
- Hosting
- Germany & EU
- Compliance
- GDPR-compliant
- API
- OpenAI-compatible
Use cases
Real AI, in the tools you already use
We don't just host models — we bring AI directly into your existing platforms, so your data stays where it lives.
Nextcloud AI
Document summarization, semantic search, and AI chat over your files — built into the Nextcloud you already operate.
GitLab AI
Code explanations, merge request summaries, and a private dev assistant — running inside your GitLab, not someone else's cloud.
Agentic coding & IDE assistants
A private endpoint for Claude Code, Cursor, Continue, and CLI agents. Your code stays in your environment.
Platform
Your private AI platform — fully managed
Built for regulated environments where data sovereignty and predictable operations matter more than benchmarks.
Privacy-first
GDPR-compliant by design. No data leaves your environment, and nothing is shared with public AI APIs.
EU & Germany hosting
Operated in European data centers. Choose Germany or another EU region depending on your compliance needs.
OpenAI-compatible API
Drop-in compatible endpoints. Point your existing tooling at our platform without rewriting your integrations.
Modern open-source models
Curated, production-grade open-source models. Updated and operated by us — no model sprawl on your side.
High-performance inference
Powered by modern AI infrastructure on H100-class hardware, tuned for real-world workloads — not benchmarks.
Transparent usage
Clear usage tracking and predictable cost structure. No surprise bills from token spikes.
Drop-in compatible
OpenAI-compatible — switch in 5 lines
Same SDKs, same endpoints. Point your base URL and API key at b'nerd and your existing tooling runs on private AI.
curl https://api.bnerd.com/v1/chat/completions \
-H "Authorization: Bearer $BNERD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
from openai import OpenAI
client = OpenAI(
base_url="https://api.bnerd.com/v1",
api_key=os.environ["BNERD_API_KEY"],
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hello!"}],
)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.bnerd.com/v1",
apiKey: process.env.BNERD_API_KEY,
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [{ role: "user", content: "Hello!" }],
});
Example model name; we'll share the current set of available models on request.
Architecture
Built for control and transparency
A platform you can grow into — from a first pilot to production-grade AI features across your tools.
- Shared infrastructure with clear isolation
- Workloads run on shared platform infrastructure with strict tenant isolation and tier-based prioritization. No surprise model swaps, no foreign data in your inference.
- Hosted in Europe
- Operated in EU and German data centers, under European jurisdiction. Data residency is a deployment choice, not a footnote.
- Open architecture
- Open-source models, OpenAI-compatible API, and standard integrations — so you can move, swap, or self-host later. No vendor lock-in.
- From Starter to Enterprise
- Pilot with Starter, scale to Business for production load, move to Enterprise for governance, compliance, and custom SLAs.
Managed AI API Gateway
Pricing
Three packages for any workload. Token prices and the credit system apply uniformly across all tiers.
All prices excl. VAT. B2B only.
Which package fits you?
For smaller production workloads, internal assistants, RAG prototypes, and controlled API usage.
For team and enterprise workloads with higher throughput, more stable usage, and prioritized processing.
For business-critical AI workloads with governance, compliance, integration, and custom operational requirements.
What's included
- 20M credits / month
- Shared best-effort AI infrastructure
- OpenAI-compatible API
- Standard queue priority
- Standard API limits & context
- Baseline monitoring
- GDPR-compliant hosting in the EU
- Availability: up to 99.5%
- Email support
What's included
- 50M credits / month
- Prioritized processing within shared infrastructure
- Extended API limits
- Higher requests / tokens per minute
- Extended context limits
- Full access to Premium models
- Optional VPN access · SSO available
- Monitoring & extended usage reporting
- Availability: up to 99.9%
- Email support · Slack Connect / Teams optional
What's included
- Custom credit quotas
- Highest priority within shared infrastructure
- Custom API limits & concurrency
- Extended context limits
- Full access to Premium models
- Private networking available
- VPN / SSO integration
- Audit logging & extended reporting
- Custom model integrations optional
- Custom SLA agreements
- Priority support (email, phone, Slack Connect / Teams)
Token pricing
Prices per 1M tokens. Applies uniformly across all packages.
Credit system
The platform bills based on credits. Different model classes consume different amounts of credits per token.
- Standard 1× credits
- Advanced 3× credits
- Premium 6× credits
- 10M Standard tokens = 10M credits
- 2M Advanced tokens = 6M credits
- 0.5M Premium tokens = 3M credits
Credit calculator
Estimate your usage and see what each package would cost — including on-demand overage.
Estimated usage
0
0
0
Enter millions of tokens per month. Credit factors: Standard 1×, Advanced 3×, Premium 6×.
Non-binding estimate. Usage beyond the included quota is billed at on-demand rates (allocated proportionally across model classes).
Term & prepayment
Longer terms or prepayment increase your credit quota — list prices stay the same.
Monthly
Standard- Standard pricing
- Flexible usage
- No minimum term
12-month commitment
+20% credits- +20% additional credits per month
- Stable price baseline for 12 months
12-month prepayment
+30% credits- +30% additional credits per month
- Prepaid — one invoice per year
Fair usage & performance
An API-first managed service: you focus on integration, we run the infrastructure and models. Tier-based limits keep performance stable under load.
Requests per minute
Tier-based RPM limits protect the platform and ensure predictable response times.
Tokens per minute
TPM limits scale with your package — Business and Enterprise have significantly higher throughput.
Context limits
Maximum context size per request, depending on package and model.
Queue prioritization
Prioritized lanes for Business and Enterprise keep response times stable under load.
Private AI — FAQs
Common questions about data residency, integrations, and the engagement model.
-
All inference and any associated data stays inside the platform we operate for you, in EU or German data centers. Nothing is forwarded to public AI APIs, and nothing is used to train shared models.
-
Yes. We expose OpenAI-compatible endpoints, so most existing client libraries, SDKs, and integrations work by changing the base URL and key. We help with the rollout.
-
Nextcloud, GitLab, and any tool that speaks the OpenAI-compatible API — including Claude Code, Cursor, Continue, and CLI agents. Custom integrations are part of the engagement.
-
Three packages (Starter 490€, Business 1490€, Enterprise from 2490€) with a monthly base fee and an included credit quota. Token usage is billed via the credit system — see the pricing section. A 12-month commitment adds +20% credits, prepayment adds +30%.
-
Yes. The platform can be deployed in our managed environment, on dedicated bare metal we operate for you, or — for some setups — inside your own infrastructure as a managed service.
-
A curated set of production-grade open-source models from the Llama, Qwen, Mistral, and DeepSeek families — grouped into Standard, Advanced, and Premium classes. The model list is refreshed regularly; we'll share the current set on request.
-
Operational logs are kept short-term and rotated automatically. Inference content is never used to train models and is not retained beyond processing. Exact retention windows are set contractually.
-
Yes. We provide a GDPR-compliant DPA that covers EU hosting, sub-processors, and technical & organizational measures.
-
Monthly billing (standard) or 12-month commitment / prepayment with bonus credits. Monthly plans can be canceled at the end of the month. Commitment plans run for the agreed term; renewal is optional.
-
Yes. Pilot programs let you validate in your environment — we tailor scope, models, and limits to your use case. Get in touch and we'll size one for you.
-
Starter: email support during business hours. Business: email plus optional Slack Connect / Teams. Enterprise: priority support including phone, with response times per the agreed SLA.
Do you have questions or would you like a personalized offer? We are happy to advise you.
Contact
Our cloud experts are happy to provide personalized advice.
- Our Office
-
Sillemstraße 76A
20257 Hamburg, Deutschland
Mon - Fri: 09:00 AM - 06:00 PM
- Telefon
- +49 40 239 69 754 0
- hello@bnerd.com