There is a constant balance that AI and automation engineers must find between the speed of the tools at their disposal and ensuring that the results are accurate.
Luckily, Mike Woolley, director of AI and automation solutions at Ibotta, has a few tricks for keeping the team from falling: system prompts and tests.
“A well-written system prompt helps ensure AI responds with relevant data in a format that makes sense to the user,” Woolley said. “It can also enforce transparency by having the AI display its reasoning and citations.”
Built In spoke with Woolley in detail about how engineers at the performance marketing platform use careful measurements, tests and prompts to ensure AI success.
Ibotta is a performance marketing platform that allows brands to deliver digital promotions to millions of consumers through a network of publishers called the Ibotta Performance Network.
What’s your rule for fast, safe releases — and what KPI proves it works?
Tooling in the AI space has made significant advancements. If we think about applying AI to information gathering and research, it’s entirely possible for a software engineer to compose an agent that returns insights in a matter of hours. While this speed is exciting, we must balance it by ensuring the system is correct.
Solving this requires two main things — a good system prompt and tests.
A well-written system prompt helps ensure AI responds with relevant data in a format that makes sense to the user. It can also enforce transparency by having the AI display its reasoning and citations.
We can then test the correctness of these responses through evaluations. Evaluations give a set of user prompts and expected results which can be compared to the actual AI response. Strong evaluations will include mechanisms to test the AI’s reasoning, as a means to catch hallucinations.
We’ve built these evaluations into multiple levels of our system and deployment pipelines, so we can track correctness trends as we build and iterate on the agents we’re rolling out. Seeing a high pass rate is a strong signal that we’re delivering something safe.
“Seeing a high pass rate is a strong signal that we’re delivering something safe.”
What standard or metric defines “quality” in your stack?
The engineering organization I lead is focused on bringing AI and automation technologies to our business. Our focus includes making these technologies self service as well as directly rolling out agentic capabilities as turnkey solutions.
The main marker of quality for our solutions is judging if what we’re building makes our partners in the business more effective in their roles. There are classic metrics to be applied: time saved, money not spent, tests passed, changes merged.
More important is the standard within the engineering team itself. Do we understand what the business needs? Are we making practical calls on what is a fit for AI or not? Are our solutions being used? Are we responding to feedback? Are we thought leaders who are driving what’s possible in the space? Achieving these points requires close partnership and a foundation of engineering expertise. If the answers to these questions are largely positive, then the quality of our solutions tends to follow.
Name one AI/automation that shipped recently and its impact on your team.
Our main focus has been on using AI to connect our employees to company knowledge. Knowledge at Ibotta takes many forms: handbooks, data, emails, playbooks, etc. Centralizing this knowledge and wrapping AI around it equips our employees to ask questions and retrieve information more quickly and consistently than manual research.
Our first application has been within our legal domain. We’ve crafted an AI agent that serves as a legal assistant. When asked a question, it will browse our internal knowledge to retrieve an answer inclusive of source citations and any exceptions. These answers come back in seconds, as opposed to potentially hours or days of research.
We’re working to apply the same concept in our sales domain. By consolidating knowledge — things like seller playbooks, leads, contact history, etc. — we equip the team to gain insights across a variety of topics. How have meetings with a certain client been? What are the gaps we’re seeing across a certain vertical? How are clients responding to a new product? AI can answer these questions accurately and quickly, equipping us to focus on making the sale, rather than researching what action to take.
