Guardrails: Keeping Dangerous Actions Off by Default
The strongest safety idea in this whole course is also the simplest, and once you've seen it you'll never un-see it: *the dangerous things should be turned off by default, so the AI can't do them even if a customer or a clever prompt tries to make it.*
This is the difference between a tool that's safe because everyone behaves, and a tool that's safe because misbehaving isn't possible. You want the second kind.
Why "off by default" beats "trained to be careful"
You can tell an AI "please don't give refunds." Most of the time it'll listen. But people are inventive, and AI is eager to please — someone will eventually phrase a request just so, and a merely-instructed bot may cave. (This is sometimes called a "jailbreak," and it's a real thing.)
The robust answer isn't better instructions. It's capability: simply don't give the assistant the ability to issue a refund, change a price, delete a record, or send money. If the power isn't wired up, no clever wording can summon it. A bot that physically cannot move money is one you never have to worry about being tricked into moving money.
Think of it like the cash register. You don't train every employee to resist the temptation of the drawer and call it security. You lock the drawer and give the key to a manager. Same instinct, applied to AI.
The short list of things to keep locked
For almost any business, these stay off unless you have a specific, deliberate reason and proper protection around them:
- Moving or refunding money.
- Changing prices, discounts, or contract terms.
- Deleting or overwriting records.
- Sending anything official on your behalf without review.
- Giving medical, legal, or financial advice (general info is fine; advice is a different animal).
By default, the AI's job is to inform and route, not to act. Informing is safe. Acting is where the risk lives.
The human handoff: your most important safety feature
The companion to locked-off actions is a graceful exit to a person. A great assistant knows its edges and says, warmly: "That's something I'll have our team handle — let me take your details and someone will reach out." This isn't a failure of the AI. It's the AI working exactly as designed. The handoff is a feature, not a flaw, and customers respect a tool that knows when to fetch a human far more than one that bluffs.
This connects straight back to lesson 3: a scoped assistant that hands off is safe; an unscoped one that improvises is a liability.
Your turn
From your lesson-3 no-go list, pick the single most dangerous action your AI could theoretically take in your business. Now ask the real question: is that ability even wired up? The ideal answer is "no — it's off, and only a human can do it." If you're not sure, that uncertainty is exactly what to resolve before you trust any AI with customers.
🔦 Guardrails keep the AI from acting wrongly. But what about when it speaks wrongly — confidently telling a customer something false? That's the next rock.
Stuck or curious?
Ask Pip about this lesson — tap the porthole bottom-right.