• A Peek Inside the Algorithm

    As a way of introduction, I should say that I work for a company that competes in some instances with Google and I specifically work in the group that is building a search platform for apps to use. My point is that my comments aren’t necessarily unbiased.

    All of that being said, I enjoyed the article about Google’s search algorithm. It was interesting to see how THE company in search tweaks results and tests new ideas. I did think the article was a fairly obvious tactic for reminding everyone that Google isn’t going down without a fight, no matter how many other applications they launch or how many competitors join the fight.

    PR ploy or not, some things to take away from the article:

    1. When working on changes to the ranking of results, get people in a room, show two sets of results (before and after) and talk through them.

    2. Use previous searches, whether good or bad, as data points for understand user’s intent and future searches.

    3. Test, test, test… Especially on actual users in actual search situations. The article mentions that almost every search that is done is happening within some test group.

    4. While you don’t want to only solve certain queries, use them as a way to discuss problems and talk about solutions.

    Posted on Mar 01.10 to Uncategorized | No Comments »  

  • My Themeword for 2010: (RE)BUILD

    I know I’m a day late but hopefully not a dollar short. I wanted to put together some thoughts about the year that was and what I’m hoping for in the upcoming year yesterday but family, friends and football conspired against me. Of course, I honestly wouldn’t have it any other way. If it becomes to important to post to a blog than being with the people you love, well, there’s a real problem there.

    For the past few years, people have posted their theme words for the year instead of just resolutions. I think this is a great idea as it can give better focus to the things you want to accomplish and help you gauge where you are at throughout the year. Tara’s word is ACHIEVE while Erica has chosen ADVENTURE. There are plenty more out there.

    My word for 2010 is (RE)BUILD.

    2009 was a tough year for me on a personal level. I had some things I had to face up to and because of that, people and things had to be allowed to fade away.

    Well, 2010 is going to be about building those relationships back to where they were and beyond. I’m especially talking about my two best friends that have been extremely patient with me and last year I didn’t pull my weight with the friendships.

    But I’m not only talking about interpersonal communication with my word, I also want to recapture the building of code, apps and ideas and sharing them with the world. I want to explore new technologies and thoughts, putting things up for people to see.

    There is plenty more I look forward to doing in 2010 but I want everything to flow out of my word: (RE)BUILD.

    Posted on Jan 02.10 to Me | 1 Comment »  

  • The Whuffie Factor

    Trent Reznor of Nine Inch Nails posted an amazing blueprint for what a new or unknown musical artist should do and focus on in trying to gain a following. The basic gist is that the creation of music is just the beginning of the relationship with fans and if it just ends there, you will not be successful no matter your definition of success.

    Have your MySpace page, but get a site outside MySpace – it’s dying and reads as cheap / generic. Remove all Flash from your website. Remove all stupid intros and load-times. MAKE IT SIMPLE TO NAVIGATE AND EASY TO FIND AND HEAR MUSIC (but don’t autoplay). Constantly update your site with content – pictures, blogs, whatever. Give people a reason to return to your site all the time. Put up a bulletin board and start a community. Engage your fans (with caution!) Make cheap videos. Film yourself talking. Play shows. Make interesting things. Get a Twitter account. Be interesting. Be real. Submit your music to blogs that may be interested. NEVER CHASE TRENDS. Utilize the multitude of tools available to you for very little cost of any – Flickr / YouTube / Vimeo / SoundCloud / Twitter etc.

    All of the above can really be boiled down to one word, whuffie. Yes, that wacky term from Cory Doctorow’s Down and Out in the Magic Kingdom but the idea is very relevant to anyone since the more you interact with people, the more ways you are judged.

    Each time you Google someone’s name to see what they are about or what they’ve done, you are checking out their whuffie. The open source world thrives on the idea of whuffie. You know people based on their IRC handle or email address. When someone who has earned the right to commit code into the main repository, they have done so by creating enough whuffie that they are trusted.

    All of this comes together in Tara Hunt’s new book, The Whuffie Factor. The main focus of the book is how companies can increase their whuffie with customers, how they can focus on creating conversations and communities. This whuffie is enhanced by interactions with social media, things like Flickr and Twitter or even Facebook. Every company is different and how they interact with their customers needs to flow from within the company not just what others have done. There’s nothing worse than seeing a company try to hard to do this and it come off as shallow or insincere.

    I read the book thinking how my company and even my parent company could do more to create the positive whuffie necessary for survival in the 21st century. It seems like such a big task, one that doesn’t seem feasible. But that’s the challenge for each of us, find places where we can interject and move our companies to have a customer-centric outlook instead of only viewing things from a purely revenue-centric model.

    The only critique I have for the book is not really a fair one. I was looking for a silver bullet on how I could increase my personal whuffie instead of just the company’s. My hope was much the same as any writing books I’ve picked up through the years, that I’d find the secret, one that didn’t require much work but instead a simple formula and BOOM!, I’d be an author.

    But that isn’t how you write nor is it how you create your whuffie. You do so with each blog post, each Tweet, each Flickr photo, each comment on someone else’s blog, each time you get involved in something larger than yourself. That’s how you build whuffie up. Tara has created an amazing amount of whuffie for herself in all that she has done. I’m very glad she was able to share some of the ways she did in the book. Definitely pick it up if you have the chance, it is well worth it.

    Just as a disclaimer, Tara’s publisher sent me a review copy. I didn’t have to promise anything in return and I definitely would have purchased the book on its first day if it would have been necessary. In fact, I probably should just to support @missrogue.

    Posted on Jul 11.09 to Books, Whuffie | No Comments »  

  • Using Solr’s AbstractSolrTestCase

    This past week I worked on utilizing Solr’s AbstractSolrTestCase which extends JUnit’s TestCase. In theory, this makes it easier to create tests that hit an index and run thru the entire search pipeline if necessary.

    Unfortunately, there isn’t a ton of docs to help out but there are plenty of examples within Solr’s source to help.

    That being said, here are a few things I found out while working with it.

    Because of the way the setUp method worked, I needed to basically duplicate much of its functionality instead of calling super.setUp(). By default, the setUp method will create the data directory for Solr in java.io.tmpdir (generally /tmp on Unix systems) and then the name of the class plus a timestamp. This was a problem for us because it meant that the index would be created for each test in a new directory.

    I realize the need for having atomic data for unit tests but I viewed these Solr tests more as integration tests than true unit tests. They were going thru the entire system as opposed to focusing on just one class or section.

    To create the index, we were hooking pieces up to our current indexing pipeline, a very nice plug-in system we developed to go through various stations to either clean data or retrieve more of it. Thankfully only a few places actually interacted with Solr so I was able to mock that communication out and just use the data collected and give it to the adoc / update methods.

    Because the pipeline wasn’t instantaneous, I wanted to reuse the indexes as much as possible. I figured a good middle ground for this would be for each test class to have its own index and all it to give the indexing pipeline information about what data it wanted to index. That index would stay until a physical directory was deleted and then it would be recreated with updated data.

    So I basically had to copy much of the existing setUp method and create the data directory with the test class name but no timestamp as well as make the tearDown method a no-op.

    With all of this done, I now have a class which any developer can extend which hopefully will increase our test coverage.

    Posted on Jun 28.09 to Development, Search, Solr | 2 Comments »  

  • A Good Day for Hadoop

    Yesterday was a very good day for the Hadoop project.

    Yahoo! announced they used a roughly 3800 node cluster to sort thru a Petabyte of data in a little over 16 hours. It’s an amazing feat for any project but especially one with so much potential as Hadoop.

    The other good news was the release of mrtoolkit, a map-reduce library written in Ruby. It utilizes Hadoop Streaming and will make it easy to run jobs and crunch data. It comes out of the New York Times dev group and I applaud them.

    I’ll have to figure out what the difference is between mrtoolkit and Wukong is so hopefully some sort of merging of the two can happen.

    Posted on May 12.09 to Hadoop, Ruby | No Comments »  

  • Testing with Redis

    Long time, no blog… But enough about that.

    On the side, I’ve been working on a new aggregator, Aggir, which allows me to test various things. I started off using SQLite and Sequel for storage, put Solr behind the scenes for search and added a very simple Web UI using Sinatra and HAML. Yeah, I think I pretty much used all the necessary hot projects. It was fun to build and it works pretty well right now.

    I have more to do on the Solr front since I’m just using the defaults for relevance searching. I’d like to dig more into the Solr internals for additional query parsing and classification at index time. It’s some of the stuff I’ve been doing at work but wanted to use a different type of data set.

    Of course, now that I had things somewhat stable, I decided to blow it all up and try something new. That something new is Redis using Ezra’s client library.

    I started down the path of updating everything, ripping out the database storage to use Redis instead. So far so good, I have the start of this on a branch. One issue I found though was testing my code. It was simple with Sequel since I could create a different database without any worry of overwriting real data. With Redis, I can easily delete keys in between tests but with the keys were the same that a real update would use so non-test data would be deleted.

    I think I’ve come up with a solution that at least is working for me. I’ve made each key combine a prefix with other data. The prefixes are defined as class variables. I only set them in the library code if they haven’t already been defined elsewhere. In my test code, I set them with an additional test-specific prefix so that I can easily delete all of the testing keys by using the keys(‘test_*’) method. This will allow me to walk thru all of the test keys created during a test and delete them before running the next test. This mirrors what is done with the database.

    I’m now able to test on the same instance that I’ve loaded with posts from various blogs. I have more to say about the mindset change from a relational db to key-value storage but I wanted to get this post out.

    Posted on Mar 18.09 to Code, Data, Databases, Development, Ruby | 2 Comments »  

  • EarthLink, Short Term Profit but Long Term?

    I worked for EarthLink three different times so it always holds a special place in my heart. It is tough to read things like this though. With all the cuts they’ve done, they were profitable in 2008 but really what does the future hold?

    I’ve talked to the few people still there and it really is just a skeleton operation technically and eventually that will need to be cut. It really is too bad since there was always so much promise but really not as much execution.

    All of this leaves EarthLink without a clear growth strategy. Once dial-up dies off, the company has no wireless or fixed infrastructure of its own to offer competing services. And even though cost-cutting has helped the company return to profitability, it won’t help solve the company’s fundamental problem, which is a lack of future strategy.

    Posted on Feb 06.09 to EarthLink, Work | 4 Comments »  

  • The Numerati

    If asked for a list of books which give a basic overview of the things I do as a coder, I usually suggest Microserfs, Hackers and Painters and maybe something like The Cathedral and the Bazaar. Now though, I think I’ll add The Numerati to that list. It isn’t that my work makes me one of the Numerati but it does give a view of how the world is changing and what sorts of things computer systems will be handing in the future.

    I enjoyed this book quite a bit. My only quibble was the lack of real meat in the discussion about the math and the systems but it’s understandable since this was a book for the mainstream not geeks like me.

    The ability to take large amounts of data and analyze it would seem to be something only companies would be able to do but I think individuals can do their own now. You could use the combination of EC2, Hadoop and Mahout and become a Numerati yourself.

    Posted on Feb 03.09 to Books, TheNumerati | No Comments »  

  • The McGwire Brothers

    Deadspin posted about Mark McGwire’s brother shopping around a book proposal, showing the truth about his use of steroids and how his brother, Jay, was the first to inject him.

    This is pretty weird for me because I played football with Jay when he was a senior and I was a junior. There was always little bits of chatter about his strength and workout routine and some speculated he was getting help.

    It’s a crazy, connected world sometimes.

    Posted on Jan 22.09 to Me | No Comments »  

  • The New Marching Orders

    Now, there are some who question the scale of our ambitions — who suggest that our system cannot tolerate too many big plans. Their memories are short. For they have forgotten what this country has already done; what free men and women can achieve when imagination is joined to common purpose, and necessity to courage.

    Such an amazing speech!

    Posted on Jan 20.09 to Me | No Comments »  

« Previous Entries
Feeds

lucasjosh.com

EMAIL

jl at lucasjosh dot com
  • Inside

    • About the non-actor Josh Lucas
  • Search


FRESH / LATEST POSTS

  • Uncategorized A Peek Inside the Algorithm
  • Me My Themeword for 2010: (RE)BUILD
  • Books  Whuffie The Whuffie Factor
  • Development  Search  Solr Using Solr’s AbstractSolrTestCase
  • Hadoop  Ruby A Good Day for Hadoop
  • Archives

    • March 2010
    • January 2010
    • July 2009
    • June 2009
    • May 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • September 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • September 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
  • Meta

    • Log in
    • Valid XHTML
    • XFN
    • WordPress
  • Copyright © 2007 by lucasjosh.com. All rights reserved.

    Modicus theme by Upstart Blogger.