Did you see the demo for the rabbit r1?
If you didn’t that’s okay. The r1 is a “barely reviewable product” that over-promised its capabilities. In the demo above, I’ve timestamped the part in which the r1 generates a potential itinerary for a trip to London for a family of three.
I have another example of another “ai” demo using vacation planning as the potential use case. This time, unread articles and links to unseen YouTube videos are being fed into Google’s NotebookLM so a customized podcast can be generated about your potential vacation, so you don’t even have to read.

It was this example that really threw me for a loop. NotebookLM was purposefully designed for generating study tools for learning and writing. Why was Tiago making a demo showing off its capabilities for planning a trip to Guatemala, despite the when the answers returned are nonsensical?
(Tiago asks NotebookLM why Antigua is so underrated, and the answer returned clearly states that it’s not, but Tiago pretends that it does?!?)

Why would he use a tool built for generating study aids like quizzes and study guides for planning a vacation? Is the ability of generating ‘key terms’ such as a list of town names in no particular order, even useful to the viewer?
And why would you try to ‘save time’ planning a vacation when planning a vacation has been found to often feel better than the vacation itself?

The choice to use vacation planning when demoing tools seems like a tell, and I am not the only one who has noticed this.

As a point of pride, I would like to include my own post, that was written two days earlier. And it contains my theory why “ai” demos use travel as their example: it provides a context that makes the measure of its results difficult to evaluate, unless you are a local.

Marques Brownlee‘s baseline for evaluating AI is to ask questions that he knows the answer to. And this should be everyone’s baseline for evaluation.
Here’s an example of such from my own life.
A couple weeks ago, I was testing how Google answered the question, Is Boolean Dead? On March 30, 2025, this was the answer I received:

Let me bring your attention to the example that Google provides in its answer to me:
If you are looking for articles about “Windsor ON” and “N8Y” you could use the boolean search “Windsor ON” AND “N8Y” to get the results that contain both keywords.
This is a terrible example. What possible article would contain these phrases?
I asked other people on Mastodon what their results were for the same search and some of them were good (“Alzheimer’s” NOT “dementia”).
No one else received a result that returned their actual city and forward sortation area of the postal code of where they live. Lucky me.
Ai is powerful and frequently wrong.
We should not normalize that incorrect results are acceptable.
I would like to end this post with some words from a recent article titled, Apple’s AI isn’t a letdown. AI is the letdown.
In other words: Large language models are fascinating science. They are an academic wonder with huge potential and some early commercial successes, such as OpenAI’s ChatGPT and Anthropic’s Claude. But a bot that’s 80% accurate — a figure Newton made up, but we’ll go with it — isn’t a very useful consumer product.
Back in June, Apple floated a compelling scenario for its newfangled Siri. Imagine yourself, frazzled and running late for work, simply saying into your phone: Hey Siri, what time does my mom’s flight land? And is it at JFK or LaGuardia? In theory, Siri could scan your email and texts with your mom and give you an answer. That saves you several annoying steps of opening your email to find the flight number, copying it, then pasting it into Google to find the flight’s status.
If it’s 100% accurate, it’s a fantastic time saver. If it is anything less than 100% accurate, it’s useless. Because even if there’s a 2% chance it’s wrong, there’s a 2% chance you’re stranding mom at the airport, and mom will be, rightly, very disappointed. Our moms deserve better!
Bottom line: Apple is not the laggard in AI. AI is the laggard in AI.