18 Minutes

'18 Minutes'

I’ve enjoyed reading Mr. Bregman’s column for the Harvard Business Review for a good while and he has put together many of his columns into an excellent book dealing with goals, focus, distractions and getting the right things done. I am the first to admit to my business / self-help book compulsion. If there’s a book which talks about goals and ways to do things better, I will read it.

One of the nice things about this book is its progression from the macro to the micro. It starts by looking at a small number of things you should focus on for the upcoming year, things which would improve your life, both personally and professionally. From there, Mr. Bregman brings you down to the daily level and helps break down your to-do list with an eye towards the yearly focus. One great thing is his realization that things come up during the day which you need to deal with but don’t have any relation towards your overall yearly focus. He calls this the ‘Other 5%’ and it’s nice to have some place for it. If you can do 95% and leave room for this, you’ll be so much further along than most. Finally, the progression ends at what you are doing moment-to-moment. This is where everything either works or doesn’t. You can have the greatest plan in the world but if you don’t stay focused on it, it won’t do anything but laugh at you from afar.

Overall, I liked this book and look forward to continuing reading Mr. Bregman’s column and future books.

Chomp – One Year Later

I arrived in Oakland, took the BART to Civic Center, walked down Market until I came to 10th, made a left and then walked past Mission until I came to Chomp HQ. It was lunchtime so as I rang the doorbell, Ben and Cathy (our CEO and CTO respectively) were walking out on their way to lunch. I came in, set my bags down and headed out to lunch with the rest of the team. That was my introduction to becoming a Chomp’er.

Since then, the company has grown, expanded our office space and overall just kept getting better. There’s been down parts as well but I’m not going to dwell on those now. This post is about the future, looking at the year behind me and looking forward to a year from now when I expect Chomp to be even bigger and better.

A Peek Inside the Algorithm

As a way of introduction, I should say that I work for a company that competes in some instances with Google and I specifically work in the group that is building a search platform for apps to use. My point is that my comments aren’t necessarily unbiased.

All of that being said, I enjoyed the article about Google’s search algorithm. It was interesting to see how THE company in search tweaks results and tests new ideas. I did think the article was a fairly obvious tactic for reminding everyone that Google isn’t going down without a fight, no matter how many other applications they launch or how many competitors join the fight.

PR ploy or not, some things to take away from the article:

1. When working on changes to the ranking of results, get people in a room, show two sets of results (before and after) and talk through them.

2. Use previous searches, whether good or bad, as data points for understand user’s intent and future searches.

3. Test, test, test… Especially on actual users in actual search situations. The article mentions that almost every search that is done is happening within some test group.

4. While you don’t want to only solve certain queries, use them as a way to discuss problems and talk about solutions.

My Themeword for 2010: (RE)BUILD

I know I’m a day late but hopefully not a dollar short. I wanted to put together some thoughts about the year that was and what I’m hoping for in the upcoming year yesterday but family, friends and football conspired against me. Of course, I honestly wouldn’t have it any other way. If it becomes to important to post to a blog than being with the people you love, well, there’s a real problem there.

For the past few years, people have posted their theme words for the year instead of just resolutions. I think this is a great idea as it can give better focus to the things you want to accomplish and help you gauge where you are at throughout the year. Tara’s word is ACHIEVE while Erica has chosen ADVENTURE. There are plenty more out there.

My word for 2010 is (RE)BUILD.

2009 was a tough year for me on a personal level. I had some things I had to face up to and because of that, people and things had to be allowed to fade away.

Well, 2010 is going to be about building those relationships back to where they were and beyond. I’m especially talking about my two best friends that have been extremely patient with me and last year I didn’t pull my weight with the friendships.

But I’m not only talking about interpersonal communication with my word, I also want to recapture the building of code, apps and ideas and sharing them with the world. I want to explore new technologies and thoughts, putting things up for people to see.

There is plenty more I look forward to doing in 2010 but I want everything to flow out of my word: (RE)BUILD.

The Whuffie Factor

Trent Reznor of Nine Inch Nails posted an amazing blueprint for what a new or unknown musical artist should do and focus on in trying to gain a following. The basic gist is that the creation of music is just the beginning of the relationship with fans and if it just ends there, you will not be successful no matter your definition of success.

Have your MySpace page, but get a site outside MySpace – it’s dying and reads as cheap / generic. Remove all Flash from your website. Remove all stupid intros and load-times. MAKE IT SIMPLE TO NAVIGATE AND EASY TO FIND AND HEAR MUSIC (but don’t autoplay). Constantly update your site with content – pictures, blogs, whatever. Give people a reason to return to your site all the time. Put up a bulletin board and start a community. Engage your fans (with caution!) Make cheap videos. Film yourself talking. Play shows. Make interesting things. Get a Twitter account. Be interesting. Be real. Submit your music to blogs that may be interested. NEVER CHASE TRENDS. Utilize the multitude of tools available to you for very little cost of any – Flickr / YouTube / Vimeo / SoundCloud / Twitter etc.

All of the above can really be boiled down to one word, whuffie. Yes, that wacky term from Cory Doctorow’s Down and Out in the Magic Kingdom but the idea is very relevant to anyone since the more you interact with people, the more ways you are judged.

Each time you Google someone’s name to see what they are about or what they’ve done, you are checking out their whuffie. The open source world thrives on the idea of whuffie. You know people based on their IRC handle or email address. When someone who has earned the right to commit code into the main repository, they have done so by creating enough whuffie that they are trusted.

All of this comes together in Tara Hunt’s new book, The Whuffie Factor. The main focus of the book is how companies can increase their whuffie with customers, how they can focus on creating conversations and communities. This whuffie is enhanced by interactions with social media, things like Flickr and Twitter or even Facebook. Every company is different and how they interact with their customers needs to flow from within the company not just what others have done. There’s nothing worse than seeing a company try to hard to do this and it come off as shallow or insincere.

I read the book thinking how my company and even my parent company could do more to create the positive whuffie necessary for survival in the 21st century. It seems like such a big task, one that doesn’t seem feasible. But that’s the challenge for each of us, find places where we can interject and move our companies to have a customer-centric outlook instead of only viewing things from a purely revenue-centric model.

The only critique I have for the book is not really a fair one. I was looking for a silver bullet on how I could increase my personal whuffie instead of just the company’s. My hope was much the same as any writing books I’ve picked up through the years, that I’d find the secret, one that didn’t require much work but instead a simple formula and BOOM!, I’d be an author.

But that isn’t how you write nor is it how you create your whuffie. You do so with each blog post, each Tweet, each Flickr photo, each comment on someone else’s blog, each time you get involved in something larger than yourself. That’s how you build whuffie up. Tara has created an amazing amount of whuffie for herself in all that she has done. I’m very glad she was able to share some of the ways she did in the book. Definitely pick it up if you have the chance, it is well worth it.

Just as a disclaimer, Tara’s publisher sent me a review copy. I didn’t have to promise anything in return and I definitely would have purchased the book on its first day if it would have been necessary. In fact, I probably should just to support @missrogue.

Using Solr’s AbstractSolrTestCase

This past week I worked on utilizing Solr’s AbstractSolrTestCase which extends JUnit’s TestCase. In theory, this makes it easier to create tests that hit an index and run thru the entire search pipeline if necessary.

Unfortunately, there isn’t a ton of docs to help out but there are plenty of examples within Solr’s source to help.

That being said, here are a few things I found out while working with it.

Because of the way the setUp method worked, I needed to basically duplicate much of its functionality instead of calling super.setUp(). By default, the setUp method will create the data directory for Solr in java.io.tmpdir (generally /tmp on Unix systems) and then the name of the class plus a timestamp. This was a problem for us because it meant that the index would be created for each test in a new directory.

I realize the need for having atomic data for unit tests but I viewed these Solr tests more as integration tests than true unit tests. They were going thru the entire system as opposed to focusing on just one class or section.

To create the index, we were hooking pieces up to our current indexing pipeline, a very nice plug-in system we developed to go through various stations to either clean data or retrieve more of it. Thankfully only a few places actually interacted with Solr so I was able to mock that communication out and just use the data collected and give it to the adoc / update methods.

Because the pipeline wasn’t instantaneous, I wanted to reuse the indexes as much as possible. I figured a good middle ground for this would be for each test class to have its own index and all it to give the indexing pipeline information about what data it wanted to index. That index would stay until a physical directory was deleted and then it would be recreated with updated data.

So I basically had to copy much of the existing setUp method and create the data directory with the test class name but no timestamp as well as make the tearDown method a no-op.

With all of this done, I now have a class which any developer can extend which hopefully will increase our test coverage.

A Good Day for Hadoop

Yesterday was a very good day for the Hadoop project.

Yahoo! announced they used a roughly 3800 node cluster to sort thru a Petabyte of data in a little over 16 hours. It’s an amazing feat for any project but especially one with so much potential as Hadoop.

The other good news was the release of mrtoolkit, a map-reduce library written in Ruby. It utilizes Hadoop Streaming and will make it easy to run jobs and crunch data. It comes out of the New York Times dev group and I applaud them.

I’ll have to figure out what the difference is between mrtoolkit and Wukong is so hopefully some sort of merging of the two can happen.

Testing with Redis

Long time, no blog… But enough about that.

On the side, I’ve been working on a new aggregator, Aggir, which allows me to test various things. I started off using SQLite and Sequel for storage, put Solr behind the scenes for search and added a very simple Web UI using Sinatra and HAML. Yeah, I think I pretty much used all the necessary hot projects. It was fun to build and it works pretty well right now.

I have more to do on the Solr front since I’m just using the defaults for relevance searching. I’d like to dig more into the Solr internals for additional query parsing and classification at index time. It’s some of the stuff I’ve been doing at work but wanted to use a different type of data set.

Of course, now that I had things somewhat stable, I decided to blow it all up and try something new. That something new is Redis using Ezra’s client library.

I started down the path of updating everything, ripping out the database storage to use Redis instead. So far so good, I have the start of this on a branch. One issue I found though was testing my code. It was simple with Sequel since I could create a different database without any worry of overwriting real data. With Redis, I can easily delete keys in between tests but with the keys were the same that a real update would use so non-test data would be deleted.

I think I’ve come up with a solution that at least is working for me. I’ve made each key combine a prefix with other data. The prefixes are defined as class variables. I only set them in the library code if they haven’t already been defined elsewhere. In my test code, I set them with an additional test-specific prefix so that I can easily delete all of the testing keys by using the keys(‘test_*’) method. This will allow me to walk thru all of the test keys created during a test and delete them before running the next test. This mirrors what is done with the database.

I’m now able to test on the same instance that I’ve loaded with posts from various blogs. I have more to say about the mindset change from a relational db to key-value storage but I wanted to get this post out.

EarthLink, Short Term Profit but Long Term?

I worked for EarthLink three different times so it always holds a special place in my heart. It is tough to read things like this though. With all the cuts they’ve done, they were profitable in 2008 but really what does the future hold?

I’ve talked to the few people still there and it really is just a skeleton operation technically and eventually that will need to be cut. It really is too bad since there was always so much promise but really not as much execution.

All of this leaves EarthLink without a clear growth strategy. Once dial-up dies off, the company has no wireless or fixed infrastructure of its own to offer competing services. And even though cost-cutting has helped the company return to profitability, it won’t help solve the company’s fundamental problem, which is a lack of future strategy.

The Numerati

If asked for a list of books which give a basic overview of the things I do as a coder, I usually suggest Microserfs, Hackers and Painters and maybe something like The Cathedral and the Bazaar. Now though, I think I’ll add The Numerati to that list. It isn’t that my work makes me one of the Numerati but it does give a view of how the world is changing and what sorts of things computer systems will be handing in the future.

I enjoyed this book quite a bit. My only quibble was the lack of real meat in the discussion about the math and the systems but it’s understandable since this was a book for the mainstream not geeks like me.

The ability to take large amounts of data and analyze it would seem to be something only companies would be able to do but I think individuals can do their own now. You could use the combination of EC2, Hadoop and Mahout and become a Numerati yourself.