The problem with AI in one image


Hey DALL-E!

A couple of years ago, I wrote a fun blog series for Textio where I asked ChatGPT to write sample critical feedback for employees of various backgrounds. I structured the queries into pairs with only one key difference within the pair: the theoretical employee's alma mater e.g.

  • "Write me sample critical feedback for a digital marketer who had a tough first year on the job after graduating from Harvard University"
  • "Write me sample critical feedback for a digital marketer who had a tough first year on the job after graduating from Howard University"

Unsurprisingly, the output was a little bland, but for any given example, it more or less looked plausible. It's only when we looked at the whole data set together that we saw the patterns. The theoretical alums from Howard, a prominent Historically Black College/University, were criticized for missing functional skills and lack of attention to detail. By contrast, the theoretical Harvard grads were asked to improve their performance by stepping up to lead more.

Huh.

Where's Waldo?

The Howard/Harvard data is fascinating because you can't see the bias in any one document. But as with a lot of AI, when you look at the details of the set as a whole, the problematic pattern emerges.

The best way to understand why you can’t automatically trust the output of ChatGPT, Claude, and other general-purpose AI functionality (unless the vendor is verifying output quality on a case-by-case basis, in their UI) is to look at AI image generation tools. It’s easier for our brains to spot hallucinations in images than in written text.

To illustrate with a seasonal and silly example: I asked DALL-E to generate "a work-appropriate image that shows a team that is setting big goals at an annual kickoff retreat." The image below is what it produced.

Wow, do I have a lot of questions. Why is a tsunami of surfers about to take over the corporate retreat? What's with the stage lighting? Is anyone worried about drowning or electrocution? Do you think the guy in the muscle shirt is embarrassed that he missed the memo about wearing a navy blazer? Why is the chair next to him missing an arm? And omg, why are they all 34yo white dudes? (JK on that one, we know why. Businesses need more masculine energy!)

Like a lot of AI images, this nods in the direction of being right while doing some truly bizarro things. This is almost a corporate retreat, but not quite. This a lot like what happens when you ask general-purpose AI for medical information. It can almost diagnose you properly! But not quite.

I love me some AI. I use general-purpose AI many, many times a day for inspiration and ideas. But I don’t trust its quality in the details, and you shouldn't either. Images show why.

Thanks for reading!

Kieran


Want to build your brand by telling data stories like this one? Learn how! Includes a 1-1 consult with me to get your story off the ground.

My latest data stories | Tell your own Viral Data Stories | nerdprocessor.com

kieran@nerdprocessor.com
Unsubscribe · Preferences

nerd processor

Every week, I write a deep dive into some aspect of AI, startups, and teams. Tech exec data storyteller, former CEO @Textio.

Read more from nerd processor

8 days a week The other day I came across this fascinating research by Microsoft talking about the "infinite workday." The telemetry from M365 users shows that people are regularly doing email at 6am, having meetings at 8pm, and working through the weekends. In theory, it's the time of year when work starts to slow down. The season of "let's circle back in the new year" has begun. The Microsoft research doesn't comment on seasonality, but I'm wondering: Does the infinite workday take a break...

How the sausage gets made Recently, a nerd processor reader asked me why I don't use AI to make nerd processor, since I write a lot about using AI at work. I replied that of course I use AI to make nerd processor! Just not in the way you might think. Where I don't use AI When you read "of course I use AI to make nerd processor," your brain probably went first to the obvious scenario: using AI to write copy. But I don't use AI to write. I write my own copy for the same reason I always have:...

Queen Bees and Wannabes, but make it profesh For many years, I've fantasized about creating a personality assessment that is like Queen Bees and Wannabes, but 1. not just for teenage girls and 2. for the workplace. Do I have expertise in psychometric testing? No, I do not! That's why it's just a fantasy. Your work personality and your real personality Sadly, there's no Queen Bees and Wannabes test (yet), but I've taken my fair share of personality tests. The Enneagram (8w3 at work, 3w8 in my...