Open Source

Tools and reference implementations for building production AI systems. Everything here is built for real-world use—not demos.

LLM Eval Harness

In development

Python

Minimal evaluation harness for gating LLM releases. Run gold-set evals in CI/CD and block deployments if precision/recall drops below your SLOs. Includes sample invoice dataset and scoring scripts.

EvaluationCI/CDGold Sets
Preparing for public release

Private Doc-Intelligence Stack

In development

HelmPython

Reference Kubernetes deployment for document intelligence. Includes ingestion, OCR, extraction, and eval services. Helm charts, Grafana dashboards, and rollback plans included. Deploy with docker-compose for local dev or on K8s for production.

KubernetesOCRDocument AIReference Architecture
Preparing for public release

Philosophy

Most open-source AI projects are demos—they show what's possible but skip the hard parts of production. These projects are different. They're built to solve real problems:

  • Eval-first — Every project includes evaluation tooling. If you can't measure it, you can't trust it.
  • Production-ready — Helm charts, dashboards, rollback plans. Not just a Python script.
  • Well-documented — READMEs that explain the tradeoffs, not just the happy path.
  • Private-first — Everything runs in your environment. No external APIs, no data leaks.

More Projects in Progress

Currently building reusable components and reference implementations. These will be open-sourced once properly documented and tested.

Follow me on GitHub for updates, or get in touch if you have specific needs.

Need help deploying private AI systems in your environment?