<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>lucasjosh.com &#187; Search</title>
	<atom:link href="http://lucasjosh.com/blog/category/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucasjosh.com/blog</link>
	<description></description>
	<lastBuildDate>Mon, 01 Mar 2010 14:15:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using Solr&#8217;s AbstractSolrTestCase</title>
		<link>http://lucasjosh.com/blog/2009/06/28/using-solrs-abstractsolrtestcase/</link>
		<comments>http://lucasjosh.com/blog/2009/06/28/using-solrs-abstractsolrtestcase/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 05:49:27 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=270</guid>
		<description><![CDATA[This past week I worked on utilizing Solr&#8217;s AbstractSolrTestCase which extends JUnit&#8217;s TestCase.  In theory, this makes it easier to create tests that hit an index and run thru the entire search pipeline if necessary.
Unfortunately, there isn&#8217;t a ton of docs to help out but there are plenty of examples within Solr&#8217;s source to [...]]]></description>
			<content:encoded><![CDATA[<p>This past week I worked on utilizing Solr&#8217;s <a href="http://lucene.apache.org/solr/api/org/apache/solr/util/AbstractSolrTestCase.html">AbstractSolrTestCase</a> which extends JUnit&#8217;s TestCase.  In theory, this makes it easier to create tests that hit an index and run thru the entire search pipeline if necessary.</p>
<p>Unfortunately, there isn&#8217;t a ton of docs to help out but there are plenty of examples within Solr&#8217;s source to help.</p>
<p>That being said, here are a few things I found out while working with it.</p>
<p>Because of the way the setUp method worked, I needed to basically duplicate much of its functionality instead of calling <i>super.setUp()</i>.  By default, the setUp method will create the data directory for Solr in <i>java.io.tmpdir</i> (generally /tmp on Unix systems) and then the name of the class plus a timestamp.  This was a problem for us because it meant that the index would be created for each test in a new directory.  </p>
<p>I realize the need for having atomic data for unit tests but I viewed these Solr tests more as integration tests than true unit tests.  They were going thru the entire system as opposed to focusing on just one class or section.</p>
<p>To create the index, we were hooking pieces up to our current indexing pipeline, a very nice plug-in system we developed to go through various stations to either clean data or retrieve more of it.  Thankfully only a few places actually interacted with Solr so I was able to mock that communication out and just use the data collected and give it to the <i>adoc / update</i> methods.</p>
<p>Because the pipeline wasn&#8217;t instantaneous, I wanted to reuse the indexes as much as possible.  I figured a good middle ground for this would be for each test class to have its own index and all it to give the indexing pipeline information about what data it wanted to index.  That index would stay until a physical directory was deleted and then it would be recreated with updated data.</p>
<p>So I basically had to copy much of the existing setUp method and create the data directory with the test class name but no timestamp as well as make the tearDown method a no-op.</p>
<p>With all of this done, I now have a class which any developer can extend which hopefully will increase our test coverage.  </p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2009/06/28/using-solrs-abstractsolrtestcase/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Google Flu Trends</title>
		<link>http://lucasjosh.com/blog/2008/11/11/google-flu-trends/</link>
		<comments>http://lucasjosh.com/blog/2008/11/11/google-flu-trends/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 07:15:07 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=199</guid>
		<description><![CDATA[Lots of people are linking to it but Google&#8217;s Flu Trends is a pretty amazing site.  
The things you can figure out when you have the incredible amount of data Google has access to can provide insights into things previously not possible.  I really think the idea that the CDC was up to [...]]]></description>
			<content:encoded><![CDATA[<p>Lots of people are linking to it but <a href="http://www.google.org/flutrends/">Google&#8217;s Flu Trends</a> is a pretty amazing site.  </p>
<p>The things you can figure out when you have the incredible amount of data Google has access to can provide insights into things previously not possible.  I really think the idea that the CDC was up to two weeks behind in noticing the outbreaks is says the most.</p>
<p>You can also <a href="http://www.google.org/about/flutrends/download.html">download the raw data</a> and display it in other ways if you&#8217;d like.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2008/11/11/google-flu-trends/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating a Search Engine</title>
		<link>http://lucasjosh.com/blog/2008/08/02/creating-a-search-engine/</link>
		<comments>http://lucasjosh.com/blog/2008/08/02/creating-a-search-engine/#comments</comments>
		<pubDate>Sun, 03 Aug 2008 00:01:37 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=104</guid>
		<description><![CDATA[Rich Skrenta knows a thing or two about search engines and crawlers.  Here&#8217;s his easy two step process of building your own one.

Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..
&#8230;
Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.skrenta.com/2008/05/blekko_is_hiring.html">Rich Skrenta</a> knows a thing or two about search engines and crawlers.  Here&#8217;s his easy two step process of building your own one.</p>
<blockquote><p>
Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..</p>
<p>&#8230;</p>
<p>Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale&#8230; Apart from the sexy high end algos there are also the boring 10-year old system libraries and off-the-shelf tools that crack under stress and sometimes need a look. You open the hood and wonder how the thing ever worked in the first place&#8230;
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2008/08/02/creating-a-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
