Basically, they set a computer program (called a "crawler") to browse the Internet and capture all the pages it reaches. The crawler follows links on pages to find more pages, which means that popular pages will be captured more often, and pages to which no links exist (sometimes called the "deep web") are excluded. Pages that indicate they do not want to be archived, via an instruction in a file called "robots.txt," are not archived by the crawler.