- The end of Out of Town News
Dave brings word of the closing of Out of Town News in the heart of Harvard Square. While totally understandable, given the current shape newspapers are in, it still is pretty sad for me on a personal level.
When I lived in Cambridge, OoTN was a daily stop on my way to the T. I grabbed the Globe or the New York Times and headed into work. Once my daughter was born, we had a father-daughter walk each Sunday morning. She was in a stroller or a Baby Bjorn and we walked to get the Sunday New York Times. It allowed my wife to have a few minutes peace without a newborn in the house. I look back very, very fondly at that time especially with my daughter now being almost a decade old.
Though some are trying to keep it going, it doesn’t look good for Out Of Town News. I’m sure others have memories of it but these are mine.
- Getting Down to the Metal
Rails Metal looks pretty darn awesome. It allows you to specify specific URI paths which will bypass the normal Rails stack, shaving precious milliseconds off your responses and not making the Baby Jesus cry.
As a byproduct with a simple config item, you can start using Rack::Cache which is a very good HTTP cache that will normally give you enough benefits until your traffic really takes off.
Jesse Newland has an even better overview of Metal including an example of using Sinatra with it.
- Scoble on Twitter vs. FriendFeed
Robert Scoble posts about why he thinks Twitter is for some people while FriendFeed is not. Wow! I don’t think I’ve ever ready anything more arrogant or pandering. I’ve followed Scoble’s blog since his days before joining UserLand but I can’t recall anything like this.
Personally I haven’t found the need for FriendFeed. I keep up with folks in my social graph pretty well right now with a mixture of Twitter, Facebook and RSS feeds. Of course, I’m not trying to follow the amount of people Scoble is so we are using things different.
I don’t mind lists like this but it really doesn’t have to be so arrogant.
- Boing Boing’s Nonfiction book list
Boing Boing has put together a great list of nonfiction books to help with your holiday shopping. I’ve pretty much just sent the link out to all my family members that usually get me books for Christmas.
- Hadoop’ing at My Desk

Last week, I started scrounging around the office for some unused PC’s. Unfortunately, they were more than just a few because of all the things going on at the Times. I grabbed three, put them on my desk and spent the rest of a day installing Ubuntu on them. Everything went really smoothly and I was very pleasantly surprised that our IT department didn’t give me a hard time for wanting a switch in the office.
I used this post to help setup a Hadoop cluster. It went really smoothly and before I knew it, the future was sitting on my desk.
Why the future? The amount of data used by companies is increasing way beyond what it used to be and systems like Hadoop allow for that data to be dealt with in more humane ways than stuffing it into some sort of database and hoping your SQL-fu can slice and dice.
Of course a three node Hadoop cluster isn’t that impressive when you compare it to the 4000 node one used by Yahoo!. But that’s ok since this is just the beginning.
What am I doing with all this power you ask? Well, let me give you an example. I have 10 years worth of archives loaded into the cluster. As part of the Articles project, I turned each text file into an Atom representation which has allowed us to do various things with the metadata. At first, I put each individual file into the HDFS (Hadoop Distributed FileSystem) but then I would have needed to write some additional code for Hadoop to look at the files individually as opposed to the default of looking at the selection of lines in each file. Eventually I’ll do that but it would have been yak shaving at the beginning.
Instead, I collapsed files from each month into one, having each line but a story. This allowed Hadoop’s default splitter to go crazy. One of the first Map/Reduce jobs I wrote was to go through each story, find all of the A1 (front page) stories and see who wrote it. That would be the Map part of it while the Reduce piece added all of the instances together so you could easily see the leaderboard. I mentioned this to one of my colleagues and warned me that having that data fall into the wrong hands could destroy the newsroom. I think he was kidding but I’m not that sure.
Other tests have been seeing what the breakdown of sections (News, Sports, Business, etc) have been on the front page, what keywords have been used the most across all 10 years as well as on the front page and more recently, using the keywords to try and train a Naive Bayes classifier using Mahout. That one didn’t really work well but the idea still intrigues me.
In all the talk of the demise of the newspaper, one thing still bothers me. Newspapers are one of the few organizations that has real information about the past, information beyond just the facts. Doing things with this information can only help find the proper place for newspapers and the data they’ve created.
Hadoop isn’t some sort of cure all for the woes we face but I think it gives a glimpse of how a future news organization could use data to do incredible things and give users a much different relationship with the news, one they would renew every day.
- Happy Thanksgiving
The kids are playing, the wife is out seeing a movie with a friend and I’m checking RSS feeds and going thru browser tabs. We’ll be going to my parents later this afternoon. I’ve seen many posts today taking stock of how things are and where things should go. I should probably do the same.
- Things I’m thankful for…
- I’m employed
- I have an amazing family which keeps me grounded
- I am once again proud of my country after Election Day
- I find new things to learn about almost daily
This barely covers everything I’m thankful for but I’m having difficulty putting them into words.
To finish off, read this post about doing better.
Being thankful shouldn’t just be a warm and fuzzy in my opinion, it should also be a call to take stock in those around you and to do better. I know I am.
- Google Flu Trends
Lots of people are linking to it but Google’s Flu Trends is a pretty amazing site.
The things you can figure out when you have the incredible amount of data Google has access to can provide insights into things previously not possible. I really think the idea that the CDC was up to two weeks behind in noticing the outbreaks is says the most.
You can also download the raw data and display it in other ways if you’d like.
- Looking into HBase
HBase is the open source implementation of Google’s Bigtable. I’ve been keeping my eye on it in combination with Hadoop. I had some extra time today so I decided to see how easy it would be to hook it up with the aggregator we built for things like Topics.
One of the nice things about HBase is the REST interface that can read and write data. I hooked up the Ruby client so that whenever I saved posts from the feed to MySQL, it would also send data to HBase.
The writing to HBase is pretty straightforward and the REST client makes it really easy. However, getting the data out needs to be looked at a bit more closely.
HBase is NOT a relational database. If you approach like it is, you will get utterly confused and frustrated. Instead, it can be thought of as a collection of Maps. So, in order to get data out, you need to iterate over the Maps looking for particular columns.
When you use the REST API, you do this via the creation of a scanner and pop‘ing off the results like from a queue.
That’s some of what I found out, let’s see what else I can dig into today.
- Doing My Civic Duty
Happy to be waiting in line!
- What To Watch For On Election Night
Nate Silver, creator of FiveThirtyEight.com, has written a great breakdown of how election night might go and what to keep your eye on.

