Google Bibliography

The Anatomy of a Large-Scale Hypertextual Web Search Engine
http://citeseer.ist.psu.edu/brin98anatomy.html

The PageRank Citation Ranking: Bringing Order to the Web
http://citeseer.ist.psu.edu/page98pagerank.html

When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics
http://www.cs.toronto.edu/~georgem/BM01.html

The Google Cluster Architecture
http://www.computer.org/micro/mi2003/m2022.pdf

MapReduce: Simplified Data Processing on Large Clusters
http://labs.google.com/papers/mapreduce.html

The Google File System
http://labs.google.com/papers/gfs.html

Interpreting the Data- Parallel Analysis with Sawzall (Draft)
http://labs.google.com/papers/sawzall-sciprog.pdf

Query-free News Search
http://www.cs.berkeley.edu/~milch/papers/www2003.pdf

Web Information Retrieval - an Algorithmic Perspective
http://citeseer.ist.psu.edu/henzinger00web.html

Video: Google: A Behind-the-Scenes Look
http://www.researchchannel.org/program/displayevent.asp?rid=2459

Video: The Google Linux Cluster
http://www.researchchannel.org/program/displayevent.asp?rid=1680

Video: BigTable: A System for Distributed Structured Storage
http://www.researchchannel.org/program/displayevent.asp?rid=2787

Video: Petabyte Processing Made Easy
http://agenda.cern.ch/fullAgenda.php?ida=a053997

Audio: The Technology Behind Google
http://technetcast.ddj.com/tnc_play_stream.html?stream_id=420

Amazon.com Bibliography

Amazon.com recommendations: item-to-item collaborative filtering
http://www.google.com/search?hl=en&lr=&biw=1004&q=Amazon.com+recommendations%3A+item-to-item+collaborative+filtering

Video: Amazon.com: A Data-Driven Enterprise
http://www.researchchannel.org/program/displayevent.asp?rid=2482

Video: Internet Search Engines
http://www.researchchannel.org/program/displayevent.asp?rid=2078

Some information on how Amazon images are generated
http://aaugh.com/imageabuse.html

Inktomi

Lessons from Giant-Scale Services
http://www.cs.berkeley.edu/~brewer/papers/GiantScale-IEEE.pdf

Combining Systems and Databases: A Search Engine Retrospective
http://www.cs.berkeley.edu/~brewer/cs262b/SearchRetro.pdf

A New Architecture for Managing Enterprise Log Data
http://www.usenix.org/events/lisa02/tech/full_papers/sah/sah.pdf

TITAN: A Next-Generation Infrastructure for Integrating Computing and Communication
http://www.cs.berkeley.edu/Research/Projects/titan/

AltaVista

Mercator: A Scalable, Extensible Web Crawler
http://research.compaq.com/SRC/mercator/papers/www/paper.pdf

Performance Limitations of the Java Core Libraries
http://research.compaq.com/SRC/mercator/papers/Java99/final.pdf

Syntactic Clustering of the Web
http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/abstracts/src-tn-1997-015.html

Identifying and Filtering Near-Duplicate Documents
http://www.cs.princeton.edu/courses/archive/spring05/cos598E/bib/CPM%25202000.pdf

Analysis of a Very Large AltaVista Query Log
http://citeseer.ist.psu.edu/silverstein98analysis.html

EBay

Ticketmaster