Tin can in the sky
A few years ago, I was flying across the country for a business trip and counting on the flight to get some work done. Unfortunately, the wifi was out for the entire trip. When I landed, I complained about it to a coworker.
"Yeah, that's annoying," he said. "On the other hand, you just spent a few hours inside a tin can in the sky, and now you're 3,000 miles away from where you had breakfast this morning. Technology is amazing!"
Take two things from this story. One, it is humbling to be surrounded by optimists. Two, we take groundbreaking technology for granted almost immediately after it becomes available.
The tin can in the sky brought me to NYC in under six hours. Cool! But I also really needed it to have wifi, so I felt disappointed rather than awe-struck.
AI is basically a tin can in the sky
I was thinking about this again the other day when I was unable to use AI to build the slides I wanted. AI can do all kinds of things that few imagined a few years ago. As a result, consumers have high expectations. When AI fails to live up to those expectations, we don't think, "This technology is amazing!" We think, "How hard is it to make a couple of slides?"
I'm seeing this all over the place with AI right now. Here is a short list of things I've heard people complain that AI has failed them at just in the last week alone:
To listen to the testimonials, you would think that AI isn't providing the complainers a whole lot of value. But underneath all the complaints, including mine, is the reality: We notice the failures because we're using AI a whole lot; the vast majority of the time, AI is succeeding or we wouldn't keep counting on it.
For years now, 99+% of my flights have had wifi. I let those trips pass unremarked upon. But I've complained loudly about the tiny fraction where wifi is broken.
User expectations are the only evals that matter
If you're building any kind of software product, especially with AI, you're probably familiar with the concept of evals. If not, think of evals as simple tests that check whether an AI is doing what it’s supposed to do in real situations.
Let's use a non-tech example: If you're writing a recipe for cheesecake, you might ask five home cooks of differing skill levels to bake a cake using the recipe. If each home cook's final cake looks and tastes like the author's cake, the recipe passes the most basic eval: it gives a predictable result.
Evals are especially important in AI products because AI makes stuff up. You only know that a product works reliably when it passes your evals in rigorous and comprehensive tests.
When you have an early product, your users tend to be more tolerant of errors, especially if they're using the product for free. But as your product gets better and more reliable, user expectations go up. What previously seemed amazing becomes table stakes expectation. Like when you fly across the country, you expect the plane will have wifi. User expectations are the only evals that matter.
AI products are in a weird spot right now. I have used AI to write more solid code in the last couple of months than I have in the rest of my life combined. I have completed a hundred previously manual data analysis tasks in a few minutes. AI is a seriously impressive tin can in the sky.
On the other hand, I haven't yet built the agent that makes my slides any good. And because my expectations are sky-high, this makes me cranky.
Kieran
If you liked this story, why not subscribe to nerd processor and get the back issues? Also, why not learn to tell data stories of your own?
My latest data stories | Build like a founder | nerdprocessor.com