Google Bibliography
The Anatomy of a Large-Scale Hypertextual Web Search Engine
http://citeseer.ist.psu.edu/brin98anatomy.html
The PageRank Citation Ranking: Bringing Order to the Web
http://citeseer.ist.psu.edu/page98pagerank.html
When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics
http://www.cs.toronto.edu/~georgem/BM01.html
The Google Cluster Architecture
http://www.computer.org/micro/mi2003/m2022.pdf
MapReduce: Simplified Data Processing on Large Clusters
http://labs.google.com/papers/mapreduce.html
The Google File System
http://labs.google.com/papers/gfs.html
Interpreting the Data- Parallel Analysis with Sawzall (Draft)
http://labs.google.com/papers/sawzall-sciprog.pdf
Query-free News Search
http://www.cs.berkeley.edu/~milch/papers/www2003.pdf
Web Information Retrieval - an Algorithmic Perspective
http://citeseer.ist.psu.edu/henzinger00web.html
Video: Google: A Behind-the-Scenes Look
http://www.researchchannel.org/program/displayevent.asp?rid=2459
Video: The Google Linux Cluster
http://www.researchchannel.org/program/displayevent.asp?rid=1680
Video: BigTable: A System for Distributed Structured Storage
http://www.researchchannel.org/program/displayevent.asp?rid=2787
Video: Petabyte Processing Made Easy
http://agenda.cern.ch/fullAgenda.php?ida=a053997
Audio: The Technology Behind Google
http://technetcast.ddj.com/tnc_play_stream.html?stream_id=420
Amazon.com Bibliography
Amazon.com recommendations: item-to-item collaborative filtering
http://www.google.com/search?hl=en&lr=&biw=1004&q=Amazon.com+recommendations%3A+item-to-item+collaborative+filtering
Video: Amazon.com: A Data-Driven Enterprise
http://www.researchchannel.org/program/displayevent.asp?rid=2482
Video: Internet Search Engines
http://www.researchchannel.org/program/displayevent.asp?rid=2078
Some information on how Amazon images are generated
http://aaugh.com/imageabuse.html
Inktomi
Lessons from Giant-Scale Services
http://www.cs.berkeley.edu/~brewer/papers/GiantScale-IEEE.pdf
Combining Systems and Databases: A Search Engine Retrospective
http://www.cs.berkeley.edu/~brewer/cs262b/SearchRetro.pdf
A New Architecture for Managing Enterprise Log Data
http://www.usenix.org/events/lisa02/tech/full_papers/sah/sah.pdf
TITAN: A Next-Generation Infrastructure for Integrating Computing and
Communication
http://www.cs.berkeley.edu/Research/Projects/titan/
AltaVista
Mercator: A Scalable, Extensible Web Crawler
http://research.compaq.com/SRC/mercator/papers/www/paper.pdf
Performance Limitations of the Java Core Libraries
http://research.compaq.com/SRC/mercator/papers/Java99/final.pdf
Syntactic Clustering of the Web
http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/abstracts/src-tn-1997-015.html
Identifying and Filtering Near-Duplicate Documents
http://www.cs.princeton.edu/courses/archive/spring05/cos598E/bib/CPM%25202000.pdf
Analysis of a Very Large AltaVista Query Log
http://citeseer.ist.psu.edu/silverstein98analysis.html
EBay
Ticketmaster