Using Solr’s AbstractSolrTestCase

This past week I worked on utilizing Solr’s AbstractSolrTestCase which extends JUnit’s TestCase. In theory, this makes it easier to create tests that hit an index and run thru the entire search pipeline if necessary.

Unfortunately, there isn’t a ton of docs to help out but there are plenty of examples within Solr’s source to help.

That being said, here are a few things I found out while working with it.

Because of the way the setUp method worked, I needed to basically duplicate much of its functionality instead of calling super.setUp(). By default, the setUp method will create the data directory for Solr in java.io.tmpdir (generally /tmp on Unix systems) and then the name of the class plus a timestamp. This was a problem for us because it meant that the index would be created for each test in a new directory.

I realize the need for having atomic data for unit tests but I viewed these Solr tests more as integration tests than true unit tests. They were going thru the entire system as opposed to focusing on just one class or section.

To create the index, we were hooking pieces up to our current indexing pipeline, a very nice plug-in system we developed to go through various stations to either clean data or retrieve more of it. Thankfully only a few places actually interacted with Solr so I was able to mock that communication out and just use the data collected and give it to the adoc / update methods.

Because the pipeline wasn’t instantaneous, I wanted to reuse the indexes as much as possible. I figured a good middle ground for this would be for each test class to have its own index and all it to give the indexing pipeline information about what data it wanted to index. That index would stay until a physical directory was deleted and then it would be recreated with updated data.

So I basically had to copy much of the existing setUp method and create the data directory with the test class name but no timestamp as well as make the tearDown method a no-op.

With all of this done, I now have a class which any developer can extend which hopefully will increase our test coverage.

Testing with Redis

Long time, no blog… But enough about that.

On the side, I’ve been working on a new aggregator, Aggir, which allows me to test various things. I started off using SQLite and Sequel for storage, put Solr behind the scenes for search and added a very simple Web UI using Sinatra and HAML. Yeah, I think I pretty much used all the necessary hot projects. It was fun to build and it works pretty well right now.

I have more to do on the Solr front since I’m just using the defaults for relevance searching. I’d like to dig more into the Solr internals for additional query parsing and classification at index time. It’s some of the stuff I’ve been doing at work but wanted to use a different type of data set.

Of course, now that I had things somewhat stable, I decided to blow it all up and try something new. That something new is Redis using Ezra’s client library.

I started down the path of updating everything, ripping out the database storage to use Redis instead. So far so good, I have the start of this on a branch. One issue I found though was testing my code. It was simple with Sequel since I could create a different database without any worry of overwriting real data. With Redis, I can easily delete keys in between tests but with the keys were the same that a real update would use so non-test data would be deleted.

I think I’ve come up with a solution that at least is working for me. I’ve made each key combine a prefix with other data. The prefixes are defined as class variables. I only set them in the library code if they haven’t already been defined elsewhere. In my test code, I set them with an additional test-specific prefix so that I can easily delete all of the testing keys by using the keys(‘test_*’) method. This will allow me to walk thru all of the test keys created during a test and delete them before running the next test. This mirrors what is done with the database.

I’m now able to test on the same instance that I’ve loaded with posts from various blogs. I have more to say about the mindset change from a relational db to key-value storage but I wanted to get this post out.

Bridges to the Future

Kevin Matheny has written a really excellent piece at BusinessWeek, extolling the virtues of agile software development.  I think it can be one of the toughest battles within a large organization but if you win and are allowed to be flexible, the benefits are easily more than any struggles you’ll have.

What this means for managing projects—including any project that relies on the Internet to deliver its value proposition—is simple: The longer your project timeline, the greater the risk that what you deliver will not be what you or your customers need when you deliver it. Not only are longer-term projects more likely to fail due to changes in requirements or conditions during the project, they’re more expensive. This increases the cost of failure. And because we can only do a few of them in a year, the impact of any one failure is huge.

The Importance of being Developer-Friendly

I’ve been working with a legacy framework the last couple of weeks. It’s something that’s been in production for over 6 years and you can definitely tell. There are files checked in with dates as extensions, always a sure sign of legacy. It’s obvious that there hasn’t been any new developers coming onto the project because the documentation is out-of-date and the entire process is wonky.

A very large problem is that everything needs to be setup with production paths or it is almost impossible to get up-and-running. I’ve been spoiled by how Rails handles the environments. Being able to separate what is for local development, what is QA testing and what is for production is an amazing way of allowing developers to get involved quickly and easily.

If you don’t have that, developers will flail around, searching thru config files or trying to follow stack traces, hoping some information can be gleaned from error messages. It isn’t easy and very frustrating and generally causes someone to lose all interest in future development.

Some things I’ve liked recently

Here’s a few things I’ve come across the last few days which are pretty nifty…

Trying to run CouchDB

Update: I’m happy to say I got CouchDB working with all the dependencies. In order for things to link correctly, I used the patch that DarwinPorts uses for compilation. The main change seems to be adding -install_name to the linker call. Once that was done, everything fell into place.

I’m trying to get CouchDB up and running. Unfortunately, I’m getting no love even after building Spidermonkey. Here’s what I’m getting:

a21772:~/apache-couchdb-0.8.0-incubating jlucas$ sudo couchdbApache CouchDB 0.8.0-incubating (LogLevel=info)
Apache CouchDB is starting.

dyld: NSLinkModule() error
dyld: Library not loaded: Darwin_DBG.OBJ/libjs.dylib
Referenced from: /usr/local/lib/couchdb/erlang/lib/couch-0.8.0-incubating/priv/lib/couch_erl_driver.so
Reason: image not found
Trace/BPT trap

I wish I could get it going as there are some interesting things I want to experiment with.

Mobile NetNewsWire

Ars Technica has a preview of a mobile version of NetNewsWire which looks pretty awesome from the screenshots.

What you won’t see here are sites or feeds that do not have any unread items. In fact, Brent has taken measures to ensure that on the mobile version of NetNewsWire, the user will only see what’s important to him. Many times during our interview, Brent mentioned that he was developing this application from the perspective of the individual who had only five minutes between tasks to take a quick look at his feeds. To that end, items you’ve read will disappear from the phone in 12 hours or so, keeping what you see on your iPhone to the bare minimum of important items.