03 / Practice

Core practice · AI Engineering

Production AI, not pilot theatre.

We design, build and operate AI-native systems that survive contact with real users — agents, RAG, evaluation harnesses, and the infrastructure underneath.

Start a project Book discovery

What we build

Five capabilities. All shipped, all maintained.

Documentation-style, not marketing-style. If a row reads like a brochure, we cut it.

Agent and workflow systems

LangGraph, Temporal, custom orchestrators. Tool use, planning, long-running workflows. Stateful, observable, restartable.

Retrieval (RAG)

Hybrid search (semantic + lexical), document chunking strategies, reranking, citation. Postgres + pgvector or Weaviate / Qdrant based on scale.

Model integration

OpenAI, Anthropic, open-source via vLLM. Vendor-neutral by default. Routing logic for cost, latency, and capability.

Evaluation, observability and guardrails

Eval datasets and harnesses (Braintrust, Langfuse, custom). Tracing, regression detection, prompt and model versioning.

Deployment, monitoring and cost control

CI/CD with eval gates, real-time cost dashboards, rate-limiting, fallback strategies. SOC2-aligned where required.

What we don't do

Honest about boundaries.

Counter-positioning. Where we're not the right call, we say so up front.

—

Foundation model R&D

Not our edge. We integrate, evaluate and operate — we don't pretrain.

—

Pure prompt-engineering no-code agents

Not engineering. Use Make, Zapier or n8n directly — they're better at it.

—

One-off chatbot widgets without evals

We won't ship something we can't measure. If there's no eval surface, there's no engagement.

How we work

Discovery, build, operate, hand off.

One team carries the engagement from first scope to production support. No vendor relay race.

Discovery — 2 to 6 weeks, flat fee

Architecture, scope, risks, costs. Output is a plan you could build with anyone, including not us.

Build — 8 to 20 weeks, weekly milestones

Senior engineers, weekly demos from real branches, merges into main from week one.

Operate — ongoing, monthly retainer

Eval gates in CI, production traces sampled and scored, on-call coverage when agreed.

Hand off — optional

When your team is ready to own it, we leave runbooks, ADRs, and a handover plan that holds.

Read the full approach

Stack · our defaults

The tools we reach for first.

Vendor-neutral on principle. Stack shifts per engagement — these are the choices we reach for first.

Modeling

Anthropic Claude
Default reasoning + agent tool use
OpenAI GPT-4 / GPT-5
Multimodal, strong tool routing
Open-source via vLLM
When latency or data residency demands it
Voyage / Cohere
Embeddings, reranking
Together AI
Routing across open weights

Retrieval

Postgres + pgvector
First choice up to ~10M chunks
Weaviate
When schema and scale grow together
Qdrant
When sub-100ms latency is the constraint
Tantivy / Meilisearch
Lexical half of hybrid search
BM25 hybrid
Default reranking layer

Infrastructure

Vercel / AWS / GCP
Per workload — not religious
Temporal
Long-running, restartable workflows
Inngest / SQS
Event-driven background work
Langfuse / Braintrust
Tracing + eval surface
Sentry / Grafana
Errors, latency, cost panels

The differentiator

Evals first. Always.

Most AI projects don't fail in production. They fail in the absence of measurement.

Evaluation datasets built with engineering team
Pre-deploy eval gates in CI
Production traces sampled, scored, fed back

$ ai-pipeline ship \
    --eval-gate=v0.4 \
    --regression-threshold=0.02 \
    --trace=production

  [eval] running 145 cases against gate v0.4
  [eval] 142 / 145 passed  (97.93%)
  [eval] 3 regressions on long-context retrieval (set: rag.long-ctx.v3)
         · case  087  rouge-l   0.71 → 0.62  (-0.09)
         · case  104  rouge-l   0.69 → 0.61  (-0.08)
         · case  121  citation  1.00 → 0.83  (-0.17)

  ✗ blocked. regression > threshold (0.02)
  → inspect:  https://app.braintrust.dev/auralink/rag/runs/8b4f...

We stay until it's stable

Operations is the work.

The numbers we hit across our production AI engagements — reported, not projected.

P012024–2025

<1.4s

Median agent latency, p99 across 8 production systems.

P02Multi-tenant fleet

99.95%

Uptime SLO target, enforced via fallback chains.

P03All engagements

$/1k

Per-tenant cost dashboards — default deliverable, not a paid add-on.

Selected work

Two engagements, briefly.

One named, one under NDA. Both shipped to real users in production.

Case study · Marketplace

Dibstr — marketplace built for production.

Scope: Brand · UX · Frontend · Backend · DevOps
Duration: 6 months
Team: 3 engineers + 1 designer
Period: 2024–2025

5×

Revenue lift in the first 90 days after launch.

From a marketing-led prototype to a live marketplace with payments, KYC and seller onboarding — structured to grow past the first cohort.

Read the case

Featured case

Digital marketplace

Case study · Healthcare · NDA

Multi-tenant agent platform for clinical workflows.

Scope: Architecture · Agents · RAG · Eval · Ops
Duration: 9 months
Team: 4 senior engineers
Period: 2025

NDA

Details available under mutual NDA on request.

Stateful clinical agents on a multi-tenant platform with hard latency budgets, eval-gated deploys, and per-tenant cost controls. Reference call available to qualified buyers.

Read the case

Pricing & engagement

Three engagement models. Honest ranges.

We don't list standard packages because there aren't any. Every engagement is scoped on a call.

Flat fee

Discovery

2–6 weeks

€15k–€40k

Architecture and ADRs
Risk register
Cost bands

T&M or fixed scope

Build

8–20 weeks

€40k–€250k

Senior engineers
Weekly demos
Eval-gated deploys

Monthly retainer

Operate

Ongoing

from €6k / mo

On-call coverage
Eval + regression
Cost ops

All engagements scoped on a call. We don't list standard packages because there aren't any.

FAQ

Six questions we get every week.

How small is too small?

Anything under four weeks of build, we recommend a Discovery only. We won't sell you a multi-month engagement we can't justify.

Do you sign DPAs / SOC2 / NDAs?

Yes. NDAs by default; DPAs and SOC2-aligned controls available on request, on file with our standard policies.

Can you work with our existing engineering team?

Yes. We pair with in-house engineers, share branches, do code review, and write the kind of documentation your team can own after we leave.

We already use a specific vendor (OpenAI, Anthropic, etc.). Is that fine?

Vendor-neutral on principle. We'll work inside the constraints you've committed to and tell you honestly when a different choice would change the outcome.

What if the AI part doesn't pan out?

Discovery is structured to find that out before you spend a build budget. Half the value of Discovery is sometimes deciding not to build the thing.

Where are you?

Tel Aviv. Remote across EU, Israel and US time zones. We overlap with London/Berlin in the morning and East Coast US in the afternoon.

Have a system that needs to work, not just demo?

Tell us what you're trying to ship. We'll reply with concrete next steps — usually within two business days.

Start a project Book discovery