Rich Skrenta knows a thing or two about search engines and crawlers. Here’s his easy two step process of building your own one.
Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..
…
Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale… Apart from the sexy high end algos there are also the boring 10-year old system libraries and off-the-shelf tools that crack under stress and sometimes need a look. You open the hood and wonder how the thing ever worked in the first place…
- BROWSE / IN TIMELINE
- « Twittering the Earthquake
- » Opening the Archives
- BROWSE / IN Search
- » Google Flu Trends
SPEAK / ADD YOUR COMMENT
Comments are moderated.
