Rich Skrenta knows a thing or two about search engines and crawlers. Here’s his easy two step process of building your own one.

Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..

Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale… Apart from the sexy high end algos there are also the boring 10-year old system libraries and off-the-shelf tools that crack under stress and sometimes need a look. You open the hood and wonder how the thing ever worked in the first place…


SPEAK / ADD YOUR COMMENT
Comments are moderated.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Return to Top