<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>lucasjosh.com &#187; Ruby</title>
	<atom:link href="http://lucasjosh.com/blog/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://lucasjosh.com/blog</link>
	<description></description>
	<lastBuildDate>Sat, 21 Jan 2012 00:51:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A Good Day for Hadoop</title>
		<link>http://lucasjosh.com/blog/2009/05/12/a-good-day-for-hadoop/</link>
		<comments>http://lucasjosh.com/blog/2009/05/12/a-good-day-for-hadoop/#comments</comments>
		<pubDate>Tue, 12 May 2009 15:47:36 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=268</guid>
		<description><![CDATA[Yesterday was a very good day for the Hadoop project. Yahoo! announced they used a roughly 3800 node cluster to sort thru a Petabyte of data in a little over 16 hours. It&#8217;s an amazing feat for any project but &#8230; <a href="http://lucasjosh.com/blog/2009/05/12/a-good-day-for-hadoop/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Yesterday was a very good day for the <a href="http://hadoop.apache.org">Hadoop</a> project.  </p>
<p>Yahoo! <a href="http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html">announced</a> they used a roughly 3800 node cluster to sort thru a Petabyte of data in a little over 16 hours.  It&#8217;s an amazing feat for any project but especially one with so much potential as Hadoop.</p>
<p> The other good news was the release of <a href="http://code.google.com/p/mrtoolkit/">mrtoolkit</a>, a map-reduce library written in Ruby.  It utilizes <a href="http://wiki.apache.org/hadoop/HadoopStreaming">Hadoop Streaming</a> and will make it easy to run jobs and crunch data.  It comes out of the <a href="http://open.blogs.nytimes.com/2009/05/11/announcing-the-mapreduce-toolkit/">New York Times dev group</a> and I applaud them.  </p>
<p>I&#8217;ll have to figure out what the difference is between mrtoolkit and <a href="http://github.com/mrflip/wukong/tree/master">Wukong</a> is so hopefully some sort of merging of the two can happen.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2009/05/12/a-good-day-for-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Testing with Redis</title>
		<link>http://lucasjosh.com/blog/2009/03/18/testing-with-redis/</link>
		<comments>http://lucasjosh.com/blog/2009/03/18/testing-with-redis/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 19:16:04 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=265</guid>
		<description><![CDATA[Long time, no blog&#8230; But enough about that. On the side, I&#8217;ve been working on a new aggregator, Aggir, which allows me to test various things. I started off using SQLite and Sequel for storage, put Solr behind the scenes &#8230; <a href="http://lucasjosh.com/blog/2009/03/18/testing-with-redis/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Long time, no blog&#8230;  But enough about that.</p>
<p>On the side, I&#8217;ve been working on a new aggregator, <a href="http://github.com/lucasjosh/aggir/tree/master">Aggir</a>, which allows me to test various things.  I started off using SQLite and Sequel for storage, put Solr behind the scenes for search and added a very simple Web UI using Sinatra and HAML.  Yeah, I think I pretty much used all the necessary <i>hot</i> projects.  It was fun to build and it works pretty well right now.    </p>
<p>I have more to do on the Solr front since I&#8217;m just using the defaults for relevance searching.  I&#8217;d like to dig more into the Solr internals for additional query parsing and classification at index time.  It&#8217;s some of the stuff I&#8217;ve been doing at work but wanted to use a different type of data set.</p>
<p>Of course, now that I had things somewhat stable, I decided to blow it all up and try something new.  That something new is <a href="http://code.google.com/p/redis/">Redis</a> using <a href="http://github.com/ezmobius/redis-rb/tree/master">Ezra&#8217;s client library</a>.  </p>
<p>I started down the path of updating everything, ripping out the database storage to use Redis instead.  So far so good, I have <a href="http://github.com/lucasjosh/aggir/tree/redis-storage">the start of this on a branch</a>.  One issue I found though was testing my code.  It was <i>simple</i> with Sequel since I could create a different database without any worry of overwriting real data.  With Redis, I can easily delete keys in between tests but with the keys were the same that a real update would use so non-test data would be deleted. </p>
<p>I think I&#8217;ve come up with a solution that at least is working for me.  I&#8217;ve made each key combine a prefix with other data.  The prefixes are defined as class variables.  I only set them in the library code if they haven&#8217;t already been defined elsewhere.  In my test code, I set them with an additional test-specific prefix so that I can easily delete all of the testing keys by using the keys(&#8216;test_*&#8217;) method.  This will allow me to walk thru all of the test keys created during a test and delete them before running the next test.  This mirrors what is done with the database.</p>
<p>I&#8217;m now able to test on the same instance that I&#8217;ve loaded with posts from various blogs.  I have more to say about the mindset change from a relational db to key-value storage but I wanted to get this post out.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2009/03/18/testing-with-redis/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Getting Down to the Metal</title>
		<link>http://lucasjosh.com/blog/2008/12/18/getting-down-to-the-metal/</link>
		<comments>http://lucasjosh.com/blog/2008/12/18/getting-down-to-the-metal/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 14:03:56 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Rails]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Scaling]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=214</guid>
		<description><![CDATA[Rails Metal looks pretty darn awesome. It allows you to specify specific URI paths which will bypass the normal Rails stack, shaving precious milliseconds off your responses and not making the Baby Jesus cry. As a byproduct with a simple &#8230; <a href="http://lucasjosh.com/blog/2008/12/18/getting-down-to-the-metal/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://weblog.rubyonrails.org/2008/12/17/introducing-rails-metal">Rails Metal</a> looks pretty darn awesome.  It allows you to specify specific URI paths which will bypass the normal Rails stack, shaving precious milliseconds off your responses and not making the Baby Jesus cry.</p>
<p>As a byproduct with a simple config item, you can start using <a href="http://tomayko.com/src/rack-cache/">Rack::Cache</a> which is a very good HTTP cache that will normally give you enough benefits until your traffic really takes off.</p>
<p>Jesse Newland <a href="http://soylentfoo.jnewland.com/articles/2008/12/16/rails-metal-a-micro-framework-with-the-power-of-rails-m">has an even better overview</a> of Metal including an example of using <a href="http://sinatra.rubyforge.org">Sinatra</a> with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2008/12/18/getting-down-to-the-metal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Looking into HBase</title>
		<link>http://lucasjosh.com/blog/2008/11/11/looking-into-hbase/</link>
		<comments>http://lucasjosh.com/blog/2008/11/11/looking-into-hbase/#comments</comments>
		<pubDate>Tue, 11 Nov 2008 17:04:23 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[REST]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/?p=197</guid>
		<description><![CDATA[HBase is the open source implementation of Google&#8217;s Bigtable. I&#8217;ve been keeping my eye on it in combination with Hadoop. I had some extra time today so I decided to see how easy it would be to hook it up &#8230; <a href="http://lucasjosh.com/blog/2008/11/11/looking-into-hbase/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://hadoop.apache.org/hbase">HBase</a> is the open source implementation of <a href="http://labs.google.com/papers/bigtable.html">Google&#8217;s Bigtable</a>.  I&#8217;ve been keeping my eye on it in combination with <a href="http://wiki.apache.org/hadoop/">Hadoop</a>.  I had some extra time today so I decided to see how easy it would be to hook it up with the aggregator we built for things like <a href="http://topics.latimes.com">Topics</a>.</p>
<p>One of the nice things about HBase is the <a href="http://wiki.apache.org/hadoop/Hbase/HbaseRest">REST interface</a> that can read and write data.  I hooked up <a href="http://github.com/sishen/hbase-ruby">the Ruby client</a> so that whenever I saved posts from the feed to MySQL, it would also send data to HBase.  </p>
<p>The writing to HBase is pretty straightforward and the REST client makes it really easy.  However, getting the data out needs to be looked at a bit more closely.</p>
<p>HBase is NOT a relational database.  If you approach like it is, you will get utterly confused and frustrated.  Instead, it can be thought of as a collection of Maps.  So, in order to get data out, you need to iterate over the Maps looking for particular columns.</p>
<p>When you use the REST API, you do this via the creation of a scanner and <i>pop</i>&#8216;ing off the results like from a queue.</p>
<p>That&#8217;s some of what I found out, let&#8217;s see what else I can dig into today.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2008/11/11/looking-into-hbase/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Starling</title>
		<link>http://lucasjosh.com/blog/2008/01/16/starling/</link>
		<comments>http://lucasjosh.com/blog/2008/01/16/starling/#comments</comments>
		<pubDate>Wed, 16 Jan 2008 14:50:06 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://lucasjosh.com/blog/2008/01/16/starling/</guid>
		<description><![CDATA[Blaine Cook from Twitter released Starling last week. He describes it as: Starling is a light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter&#8217;s backend, and is in production across Twitter&#8217;s cluster. I&#8217;m always &#8230; <a href="http://lucasjosh.com/blog/2008/01/16/starling/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://romeda.org/">Blaine Cook</a> from <a href="http://www.twitter.com">Twitter</a> released <a href="http://rubyforge.org/projects/starling/">Starling</a> last week.  He describes it as:</p>
<blockquote><p>
Starling is a light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter&#8217;s backend, and is in production across Twitter&#8217;s cluster.
</p></blockquote>
<p>I&#8217;m always a sucker for new infrastructure code but the problem is trying to figure out how to see if it will fit within whatever current architecture is already in place.  I think though I might have found somewhere.  </p>
<p>We have an aggregator grabbing various RSS feeds both internally and externally.  Right now, during the parsing / adding to the database, we send off the information to our indexer for later searching.  This is basically how the <a href="http://topics.latimes.com">Topics pages</a> are built dynamically.</p>
<p>Instead of sending the data during run-time, maybe it would be better to just add it to a queue with Starling for later processing.  A job in the indexer could check the queue and index anything that&#8217;s there.  This also would allow other applications to add things ready for indexing.</p>
]]></content:encoded>
			<wfw:commentRss>http://lucasjosh.com/blog/2008/01/16/starling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

