Open Source

Tools and reference implementations for building production AI systems. Everything here is built for real-world use—not demos.

LLM Eval Harness

In development

Python

Minimal evaluation harness for gating LLM releases. Run gold-set evals in CI/CD and block deployments if precision/recall drops below your SLOs. Includes sample invoice dataset and scoring scripts.

EvaluationCI/CDGold Sets
Extracting from client work (NDA)

Private Doc-Intelligence Stack

In development

HelmPython

Reference Kubernetes deployment for document intelligence. Includes ingestion, OCR, extraction, and eval services. Helm charts, Grafana dashboards, and rollback plans included. Deploy with docker-compose for local dev or on K8s for production.

KubernetesOCRDocument AIReference Architecture
Extracting from client work (NDA)

Philosophy

Most open-source AI projects are demos—they show what's possible but skip the hard parts of production. These projects are different. They're built to solve real problems:

  • Eval-first — Every project includes evaluation tooling. If you can't measure it, you can't trust it.
  • Production-ready — Helm charts, dashboards, rollback plans. Not just a Python script.
  • Well-documented — READMEs that explain the tradeoffs, not just the happy path.
  • Private-first — Everything runs in your environment. No external APIs, no data leaks.

More Projects in Progress

Currently extracting reusable components from client projects. These will be open-sourced as reference implementations once properly documented and anonymized.

Follow me on GitHub for updates, or get in touch if you have specific needs.

Need help deploying private AI systems in your environment?