What a Search Engine Really Is

Lesson 1 · taught by Pip · free preview

When you search for something, it feels like the engine races off across the whole internet, reads every page, and rushes back with answers. That's the natural picture. It's also wrong — and the truth is far more clever.

A search engine does not look at the live web when you press enter. It looks at a copy it made earlier. Think of a librarian who has already read every book in the world and written tidy notes about each one. When you ask a question, she doesn't re-read the books. She flips through her notes, which is why the answer comes back so fast.

That's the whole secret. The hard, slow work happens before you ever search. By the time you type, the engine is just consulting notes it prepared in advance.

The three jobs

Every search engine does three separate jobs, and it helps to keep them apart in your mind.

The first job is crawling — wandering the web, page by page, reading what's there. This is the librarian walking the shelves, opening each book. We'll cover it next lesson.

The second job is indexing — filing what was read into an organized system so it can be found again. This is the librarian writing her notes and sorting them. Without it, the engine would have a pile of pages and no way to look anything up.

The third job is ranking — deciding, out of the thousands of pages that match your words, which ten to show first. This is the librarian judging which book actually answers your question best. It's the job people argue about most, and we'll spend real time on it.

Crawl, index, rank. Read the web, file it, sort it. Hold those three words and you already understand the shape of the whole machine.

Why the copy matters

Here's why making a copy first is so smart. The web is enormous — billions of pages — and it changes constantly. If the engine searched it live, every question would take hours, and most pages would refuse to answer fast enough.

By reading everything ahead of time and keeping organized notes, the engine turns a hopeless task into a quick lookup. Our librarian read the books on quiet afternoons over months; now, when you ask, she just glances at her notes. The reading was slow, but the answering is instant — because the slow part already happened.

The trade-off is that the notes can fall a little behind reality. A page might change after the engine last read it, so what you see can be slightly stale. That's why engines keep crawling — re-reading pages to keep the notes fresh. 🔦

A word on "the web"

One honest limit: a search engine can only find pages it's allowed to reach. Things behind a login — your email, private documents, a paywalled article — aren't in the notes, because the librarian was never let in. So "searching the web" really means "searching the public, reachable web that this engine has read." Good to know what's not in the box.

Your turn

Search for anything, then glance at the very top of the results — most engines quietly print how many matches they found ("About 4,200,000 results") and how long it took (a fraction of a second). Notice the mismatch: millions of matches, returned in a blink. That speed is the pre-made copy doing its work.

Next we'll follow the librarian onto the shelves — "Crawling: How the Web Gets Read."

Stuck or curious?

Ask Pip about this lesson — tap the porthole bottom-right.