Skip to content
03 / Practice
Core practice · AI Engineering

Production AI, not pilot theatre.

We design, build and operate AI-native systems that survive contact with real users — agents, RAG, evaluation harnesses, and the infrastructure underneath.

01
What we build

Five capabilities. All shipped, all maintained.

Documentation-style, not marketing-style. If a row reads like a brochure, we cut it.
  • 01

    Agent and workflow systems

    LangGraph, Temporal, custom orchestrators. Tool use, planning, long-running workflows. Stateful, observable, restartable.

  • 02

    Retrieval (RAG)

    Hybrid search (semantic + lexical), document chunking strategies, reranking, citation. Postgres + pgvector or Weaviate / Qdrant based on scale.

  • 03

    Model integration

    OpenAI, Anthropic, open-source via vLLM. Vendor-neutral by default. Routing logic for cost, latency, and capability.

  • 04

    Evaluation, observability and guardrails

    Eval datasets and harnesses (Braintrust, Langfuse, custom). Tracing, regression detection, prompt and model versioning.

  • 05

    Deployment, monitoring and cost control

    CI/CD with eval gates, real-time cost dashboards, rate-limiting, fallback strategies. SOC2-aligned where required.

02
What we don't do

Honest about boundaries.

Counter-positioning. Where we're not the right call, we say so up front.

Foundation model R&D

Not our edge. We integrate, evaluate and operate — we don't pretrain.

Pure prompt-engineering no-code agents

Not engineering. Use Make, Zapier or n8n directly — they're better at it.

One-off chatbot widgets without evals

We won't ship something we can't measure. If there's no eval surface, there's no engagement.

03
How we work

Discovery, build, operate, hand off.

One team carries the engagement from first scope to production support. No vendor relay race.
01

Discovery — 2 to 6 weeks, flat fee

Architecture, scope, risks, costs. Output is a plan you could build with anyone, including not us.

02

Build — 8 to 20 weeks, weekly milestones

Senior engineers, weekly demos from real branches, merges into main from week one.

03

Operate — ongoing, monthly retainer

Eval gates in CI, production traces sampled and scored, on-call coverage when agreed.

04

Hand off — optional

When your team is ready to own it, we leave runbooks, ADRs, and a handover plan that holds.

04
Stack · our defaults

The tools we reach for first.

Vendor-neutral on principle. Stack shifts per engagement — these are the choices we reach for first.
Modeling
  • Anthropic Claude
    Default reasoning + agent tool use
  • OpenAI GPT-4 / GPT-5
    Multimodal, strong tool routing
  • Open-source via vLLM
    When latency or data residency demands it
  • Voyage / Cohere
    Embeddings, reranking
  • Together AI
    Routing across open weights
Retrieval
  • Postgres + pgvector
    First choice up to ~10M chunks
  • Weaviate
    When schema and scale grow together
  • Qdrant
    When sub-100ms latency is the constraint
  • Tantivy / Meilisearch
    Lexical half of hybrid search
  • BM25 hybrid
    Default reranking layer
Infrastructure
  • Vercel / AWS / GCP
    Per workload — not religious
  • Temporal
    Long-running, restartable workflows
  • Inngest / SQS
    Event-driven background work
  • Langfuse / Braintrust
    Tracing + eval surface
  • Sentry / Grafana
    Errors, latency, cost panels
05
The differentiator

Evals first. Always.

Most AI projects don't fail in production. They fail in the absence of measurement.
  • Evaluation datasets built with engineering team
  • Pre-deploy eval gates in CI
  • Production traces sampled, scored, fed back
$ ai-pipeline ship \
    --eval-gate=v0.4 \
    --regression-threshold=0.02 \
    --trace=production

  [eval] running 145 cases against gate v0.4
  [eval] 142 / 145 passed  (97.93%)
  [eval] 3 regressions on long-context retrieval (set: rag.long-ctx.v3)
         · case  087  rouge-l   0.71 → 0.62  (-0.09)
         · case  104  rouge-l   0.69 → 0.61  (-0.08)
         · case  121  citation  1.00 → 0.83  (-0.17)

  ✗ blocked. regression > threshold (0.02)
  → inspect:  https://app.braintrust.dev/auralink/rag/runs/8b4f...
06
We stay until it's stable

Operations is the work.

The numbers we hit across our production AI engagements — reported, not projected.
P012024–2025
<1.4s

Median agent latency, p99 across 8 production systems.

P02Multi-tenant fleet
99.95%

Uptime SLO target, enforced via fallback chains.

P03All engagements
$/1k

Per-tenant cost dashboards — default deliverable, not a paid add-on.

08
Pricing & engagement

Three engagement models. Honest ranges.

We don't list standard packages because there aren't any. Every engagement is scoped on a call.
Flat fee

Discovery

2–6 weeks
€15k–€40k
  • Architecture and ADRs
  • Risk register
  • Cost bands
T&M or fixed scope

Build

8–20 weeks
€40k–€250k
  • Senior engineers
  • Weekly demos
  • Eval-gated deploys
Monthly retainer

Operate

Ongoing
from €6k / mo
  • On-call coverage
  • Eval + regression
  • Cost ops

All engagements scoped on a call. We don't list standard packages because there aren't any.

09
FAQ

Six questions we get every week.

  • How small is too small?

    Anything under four weeks of build, we recommend a Discovery only. We won't sell you a multi-month engagement we can't justify.

  • Do you sign DPAs / SOC2 / NDAs?

    Yes. NDAs by default; DPAs and SOC2-aligned controls available on request, on file with our standard policies.

  • Can you work with our existing engineering team?

    Yes. We pair with in-house engineers, share branches, do code review, and write the kind of documentation your team can own after we leave.

  • We already use a specific vendor (OpenAI, Anthropic, etc.). Is that fine?

    Vendor-neutral on principle. We'll work inside the constraints you've committed to and tell you honestly when a different choice would change the outcome.

  • What if the AI part doesn't pan out?

    Discovery is structured to find that out before you spend a build budget. Half the value of Discovery is sometimes deciding not to build the thing.

  • Where are you?

    Tel Aviv. Remote across EU, Israel and US time zones. We overlap with London/Berlin in the morning and East Coast US in the afternoon.

Have a system that needs to work, not just demo?

Tell us what you're trying to ship. We'll reply with concrete next steps — usually within two business days.