Why is it so hard to get AI right?


The internet in 2024, a three-act play

Act 1: Google strikes a deal to use Reddit posts as training data
Act 2: Google swaps in AI for its usual search results
Act 3: Google tells you to eat rocks and put glue in your pasta sauce

Why is it so hard to get AI right?

Hey Siri, what's a HIPAA violation?

I had this conversation with the HR leader of a major hospital system last year, and I haven't stopped thinking about it since.

It was a few months after ChatGPT was broadly released, and she shared how enthusiastically everyone in her organization had embraced the tool. The hospital had convened a task force to look at AI, and they were busy thinking up powerful use cases to accelerate patient care.

Except that the doctors had already put an unsanctioned use case into motion, and it was enough to give the CISO a coronary: the doctors were dropping confidential patient notes into ChatGPT to write case summaries faster. Their AI firewall went up faster than you can say HIPAA violation.

Why is it so hard to get AI right?

Sparkle questions

You know those little AI sparkle questions that LinkedIn adds below most posts now? And sometimes they’re relevant, mostly they’re kind of weird, and sometimes they’re real clunkers?

I saw a post someone made about surviving cancer, and the sparkle question underneath it was, “How can cancer add value to our lives?” This is all kinds of yikes, so I did the obvious thing: I wrote a post about it. Unfortunately, LinkedIn appended
another dicey sparkle question to my post about the original post.

Uh oh. Why is it so hard to get AI right?

Finding terrible AI examples is easy!

There are so many transformative, wonderful AI use cases and apps. But it's just not that hard to find these terrible examples, and so many of them feel like unforced errors. Popular and trusted apps silently updating their privacy policies to be able to use their user input as training data, without notifying users. People dumping sensitive information into free public websites. Developers racing to implement AI features without safeguards to ensure accuracy, privacy, or bias protections.

It's fair to say that I'm not a typical user; I spent a decade building proprietary technology to handle AI sensitivities at Textio, and it is very, very hard. After I saw the problematic LinkedIn questions, I started searching on topics to see if I could figure out when those little questions would trigger.

I discovered quickly that if I searched for terms like guns or Israel, there were no sparkle questions. In other words, I could clearly see that LinkedIn has implemented some "catch" topics where the system intercepts the query and doesn't take the risk of showing inappropriate questions. But, at least until my post, cancer wasn't one of them.

In just 30 minutes of rooting around, I found dozens of other topics that occasionally triggered highly insensitive sparkle questions: anorexia, abuse, adoption, layoffs, and immigration, to name just a few. Finding terrible examples was easy.

So why is it so hard to get AI right?

LinkedIn is not alone in their approach to trying to solve this problem; this kind of query filtering is what everyone does. Remember in 2023 when conservatives blasted OpenAI for the way ChatGPT talked differently about Joe Biden and Donald Trump? OpenAI "fixed" that the fastest way they could think of: by intercepting the political queries before processing them, and tossing up the equivalent of I can't do that, Hal.

Unfortunately, this never quite works, because you just can't write rules to catch all the ways that future queries might be problematic. I've written extensively about the racist ways in which ChatGPT describes alumni of Harvard University and Howard University differently, or about the stereotypical "roses are red" poems that the system writes for people of different backgrounds. It's easy to trip over the problematic biases in most AI just by wording queries a little differently than what app developers have planned for.

The problem can't be solved by manually intercepting people's queries. The issue is with the underlying generative engine and data set, so app developers who try to intercept queries end up trying to whack-a-mole mortifying examples forever. After all, nearly any topic can be sensitive or not depending on context.

The LinkedIn situation is better than average; a concerned exec saw my post and shared it with the product team, who replied quickly to my public post taking responsibility and removing the problematic sparkle question you can see in my original screen shot.

Like most teams implementing AI, the LinkedIn team would like to get it right. And like most teams implementing AI, the approach they're using to fix the offending features means that they'll never run out of problematic examples.

The LinkedIn case and the HIPAA violation case and the Google case are fundamentally similar. The problem isn't exactly with the technology. As with most technologies, the issue is with human judgment around its implementation and usage.

What do you think?

Thanks for reading!

Kieran

Catch up on nerd processor case studies | Subscribe | nerdprocessor.com

kieran@nerdprocessor.com
Unsubscribe · Preferences

nerd processor

Every week, I write a deep dive into some aspect of AI, startups, and teams. Tech exec data storyteller, former CEO @Textio.

Read more from nerd processor

Queen Bees and Wannabes, but make it profesh For many years, I've fantasized about creating a personality assessment that is like Queen Bees and Wannabes, but 1. not just for teenage girls and 2. for the workplace. Do I have expertise in psychometric testing? No, I do not! That's why it's just a fantasy. Your work personality and your real personality Sadly, there's no Queen Bees and Wannabes test (yet), but I've taken my fair share of personality tests. The Enneagram (8w3 at work, 3w8 in my...

On the hunt Over the last 12 months, I have talked 16 different friends through career transitions. Not clients that I have coached or advised, though there are some of those too. In this case, I'm talking about 16 people I know personally. That's a lot! Some of them have been laid off. Some just want to work somewhere else. Others are just looking to do something different, perhaps a new kind of work or perhaps the same work but on different terms (like freelancing vs. working in-house). A...

Secret agents Last year, I wrote that more than 75% of the AI startups I saw were explicitly pitching job replacement in their fundraising decks (but not always in their sales decks). The majority of these were building some kind of agentic AI. Fast forward to today, and where are we? AI agent as change agent Agentic AI is designed to act autonomously to complete tasks without continuous human oversight. It is typically focused on completing a domain-specific task. For instance, agentic AI...