AI systems architect · 20+ years
Production AI systems — RAG, agents, and applied AI that delivers.
I'm Yossi Gordin — ex-VP R&D/CTO, 20+ years building software. I architect and ship retrieval-grounded RAG systems, computer-vision and agentic tools — and roll out enterprise AI-assisted development without leaking your code.
Built with
- 20+ yrs
- engineering leadership
- 401K+
- records in a live RAG
- −84%
- API p95 latency cut
- 70–90%
- VLM cost cut (dvr_ai)
What I do
Five things, done to production standard
Not a generic dev shop. Deep, demonstrable work in the parts of AI that are hard to get right in production.
AI systems architecture & LLM integration
The end-to-end design that turns a model demo into a system that survives production.
Learn moreRAG systems — multilingual ready
Retrieval that answers from your data — grounded in citations, evaluated, and safe to put in front of users.
Learn moreComputer vision & multimodal AI
Real-time video and image AI that's accurate — and cheap enough to run continuously.
Learn moreAgentic systems
Tool-calling agents that are orchestrated, evaluated, and benchmarked — not vibes.
Learn moreEnterprise AI-assisted development
Roll out Claude Code across your team without your source code leaving your boundary.
Learn moreSelected work
Shipped, not slideware
A natural-language market-intelligence assistant over 401K+ records
A natural-language assistant for price analysis and market discovery, grounded in a 401K+ record corpus — answers without hallucinated prices or specs.
GitLab Ultimate CI/CD migration off tag proliferation
Migrated a sprawling tag-based release process to clean branch/environment CI/CD on GitLab Ultimate, with LDAP/AD and approval gates.
dvr_ai — real-time AI video surveillance for loss prevention
Monitors DVR/RTSP feeds for cash theft, sweethearting and restricted-access events — motion-gating cuts VLM API cost 70–90%.
poker-copilot — real GTO solver maths with agentic orchestration
Production poker analysis using real solver mathematics (no approximations), orchestrated by a Groq ReAct agent — sub-5s with caching.
Articles
Notes from production
Stop your RAG system hallucinating
Most RAG hallucinations are retrieval failures, not generation failures. Diagnose which, ground answers in cited context, make the model abstain, and track faithfulness.
Self-hosting open models vs an API: where the cost actually crosses over
Self-hosting open-weight models beats API pricing at high steady throughput or under data-residency rules; APIs win for spiky, low-volume, or frontier-quality work.
Model routing: stop sending every request to your biggest model
Most LLM traffic doesn't need a frontier model. Route by rules, a classifier, or a cascade to cut spend several-fold without silently degrading quality.
Have a system that needs to reach production?
Book a 15-minute call. No sales pitch — a straight technical conversation about your problem and whether I'm the right person to solve it.