Free Lightning Lessons to Advance Your RAG Implementations
I'll be hosting industry experts to share practical techniques for enhancing your Retrieval Augmented Generation (RAG) systems.
I'll be hosting industry experts to share practical techniques for enhancing your Retrieval Augmented Generation (RAG) systems.
Note: this post is archival. I’m no longer taking consulting engagements or running cohorts.
Let me share something I wish I'd understood sooner: consistent content creation isn't just a marketing tactic—it's the foundation of a thriving consulting business.
When I started my consulting journey, I was stuck in the time-for-money trap. I'd jump on Zoom calls with prospects, explain the same concepts repeatedly, and wonder why scaling was so difficult. Then I had a realization that changed everything: what if I could have these conversations at scale?
Now I extract blog post ideas from every client call. Every Friday, I review about 17 potential topics from the week's conversations. I test them with social posts, see which ones get traction (some get 700 views, others 200,000), and develop the winners into comprehensive content.
Here's why this approach has transformed my business:
Imagine this: you open Cursor, ask it to build a feature in YOLO-mode, and let it rip. You flip back to Slack, reply to a few messages, check your emails, and return...
It's still running.
What the hell is going on? .sh files appear, there's a fresh Makefile, and a mysterious .gitignore. Anxiety creeps in. Should you interrupt it? Could you accidentally trash something critical?
Relax—you're not alone. This anxiety is common, especially among developers newer to powerful agents like Cursor's. Fortunately, Git is here to save the day.
In Part 1, you learned the basics of safely using Git with Cursor agents. Now, let's level up your workflow by diving into advanced Git practices and explicitly instructing Cursor to handle these for you.
Systematically improving RAG systems
This transcript is based off of a guest lecture from my Systematically Improving RAG Applications series.
Retrieval-Augmented Generation (RAG) systems have become essential tools for enterprises looking to harness their vast repositories of internal knowledge. While the theoretical foundations of RAG are well-understood, implementing these systems effectively in enterprise environments presents unique challenges that aren't addressed in academic literature or consumer applications. This article delves into advanced techniques for fine-tuning embedding models in enterprise RAG systems, based on insights from Manav Rathod, a software engineer at Glean who specializes in semantic search and ML systems for search ranking and assistant quality.
The discussion focuses on a critical yet often overlooked component of RAG systems: custom-trained embedding models that understand company-specific language, terminology, and document relationships. As Jason Liu aptly noted during the session, "If you're not fine-tuning your embeddings, you're more like a Blockbuster than a Netflix." This perspective highlights how critical embedding fine-tuning has become for competitive enterprise AI systems.
Here's the thing about RAG (Retrieval-Augmented Generation): everyone's obsessed with fancy embeddings and vector search, but they're missing something crucial – authority matters just as much as relevance.
My students constantly ask about a classic problem: "What happens when new documents supersede old ones?" A technical guide from 2023 might be completely outdated by a 2025 update, but pure semantic search doesn't know that. It might retrieve the old version simply because the embedding is marginally closer to the query.
This highlights a bigger truth: relevancy, freshness, and authority are all critical signals that traditional information retrieval systems juggled effectively. Somehow we've forgotten these lessons in our rush to build RAG systems. The newest and shiniest AI technique isn't always the complete solution.
I've spent years working with ML systems, and I've seen this pattern repeatedly. We get excited about semantic search, but forget the hard-won lessons from decades of information retrieval: not all sources deserve equal trust.
I never planned to become a consultant. But somewhere between building machine learning systems and growing my Twitter following, companies started sliding into my DMs with the same message: "Help, our AI isn't working."
So I started charging to join their stand-ups. Sometimes I didn't even code. I just asked uncomfortable questions.
Here's what I've learned watching companies burn millions on AI.
This artifact packages the decision flow, cost math, and operational playbook so the team can set defensible thresholds and understand trade-offs at a glance.
Retrieval-Augmented Generation (RAG) is a simple, powerful idea: attach a large language model (LLM) to external data, and harness better, domain-specific outputs. Yet behind that simplicity lurks a maze of hidden pitfalls: no metrics, no data instrumentation, not even clarity about what exactly we’re trying to improve.
In this mega-long post, I’ll lay out everything I know about systematically improving RAG apps—from fundamental retrieval metrics, to segmentation and classification, to structured extraction, multimodality, fine-tuned embeddings, query routing, and closing the loop with real user feedback. It’s the end-to-end blueprint for building and iterating a RAG system that actually works in production.
I’ve spent years consulting on applied AI—spanning recommendation systems, spam detection, generative search, and RAG. That includes building ML pipelines for large-scale recommendation frameworks, doing vision-based detection, curation of specialized datasets, and more. In short, I’ve seen many “AI fails” up close. Over time, I’ve realized that gluing an LLM to your data is just the first step. The real magic is how you measure, iterate, and keep your system from sliding backward.
We’ll break everything down in a systematic, user-centric way. If you’re tired of random prompt hacks and single-number “accuracy” illusions, you’re in the right place.
Let me share a story that might sound familiar.
A few months back, I was helping a Series A startup with their LLM deployment. Their CTO pulled me aside and said, "Jason, we're burning through our OpenAI credits like crazy, and our responses are still inconsistent. We thought fine-tuning would solve everything, but now we're knee-deep in training data issues."
Fast forward to today, and I’ve been diving deep into these challenges as an advisor to Zenbase, a production level version of DSPY. We’re on a mission to help companies get the most out of their AI investments. Think of them as your AI optimization guides, they've been through the trenches, made the mistakes, and now we’re here to help you avoid them.
In this post, I’ll walk you through some of the biggest pitfalls. I’ll share real stories, practical solutions, and lessons learned from working with dozens of companies.