Starting a Newspaper

Seth Godin writes about he would start a newspaper with roughly six people or so. What about the institutions that have many more people already writing for publication?

When I was at The Times, a few of us had this idea, trying to capitalize on everyone’s belief that focusing on local news is the way for a newspaper to survive. Since everyone lives in a neighborhood, why not use that for the beats? Make sure everyone has a laptop which can handle the reporting as well as any video / photo editing that needs to happen as well as a camera which can also do video.

Editors should lay down the law that anyone seen in the office more than once a week would be in serious trouble. Instead, they should be talking to local businesses about what’s going on, interviewing local high school athletes and becoming a known entity at all city meetings. This would allow for local stories to be reported on in a much different way. Blogs could be started for cities that would become must-reads for everyone involved. A few times a week, have blog entries reverse-published into the print edition but overall make everything focused on the Web.

Would this work? Would this save a newspaper? Who knows but really at this point, what do they have to lose?

The end of Out of Town News

Dave brings word of the closing of Out of Town News in the heart of Harvard Square. While totally understandable, given the current shape newspapers are in, it still is pretty sad for me on a personal level.

When I lived in Cambridge, OoTN was a daily stop on my way to the T. I grabbed the Globe or the New York Times and headed into work. Once my daughter was born, we had a father-daughter walk each Sunday morning. She was in a stroller or a Baby Bjorn and we walked to get the Sunday New York Times. It allowed my wife to have a few minutes peace without a newborn in the house. I look back very, very fondly at that time especially with my daughter now being almost a decade old.

Though some are trying to keep it going, it doesn’t look good for Out Of Town News. I’m sure others have memories of it but these are mine.

Hadoop’ing at My Desk

photo.jpg

Last week, I started scrounging around the office for some unused PC’s. Unfortunately, they were more than just a few because of all the things going on at the Times. I grabbed three, put them on my desk and spent the rest of a day installing Ubuntu on them. Everything went really smoothly and I was very pleasantly surprised that our IT department didn’t give me a hard time for wanting a switch in the office.

I used this post to help setup a Hadoop cluster. It went really smoothly and before I knew it, the future was sitting on my desk.

Why the future? The amount of data used by companies is increasing way beyond what it used to be and systems like Hadoop allow for that data to be dealt with in more humane ways than stuffing it into some sort of database and hoping your SQL-fu can slice and dice.

Of course a three node Hadoop cluster isn’t that impressive when you compare it to the 4000 node one used by Yahoo!. But that’s ok since this is just the beginning.

What am I doing with all this power you ask? Well, let me give you an example. I have 10 years worth of archives loaded into the cluster. As part of the Articles project, I turned each text file into an Atom representation which has allowed us to do various things with the metadata. At first, I put each individual file into the HDFS (Hadoop Distributed FileSystem) but then I would have needed to write some additional code for Hadoop to look at the files individually as opposed to the default of looking at the selection of lines in each file. Eventually I’ll do that but it would have been yak shaving at the beginning.

Instead, I collapsed files from each month into one, having each line but a story. This allowed Hadoop’s default splitter to go crazy. One of the first Map/Reduce jobs I wrote was to go through each story, find all of the A1 (front page) stories and see who wrote it. That would be the Map part of it while the Reduce piece added all of the instances together so you could easily see the leaderboard. I mentioned this to one of my colleagues and warned me that having that data fall into the wrong hands could destroy the newsroom. I think he was kidding but I’m not that sure.

Other tests have been seeing what the breakdown of sections (News, Sports, Business, etc) have been on the front page, what keywords have been used the most across all 10 years as well as on the front page and more recently, using the keywords to try and train a Naive Bayes classifier using Mahout. That one didn’t really work well but the idea still intrigues me.

In all the talk of the demise of the newspaper, one thing still bothers me. Newspapers are one of the few organizations that has real information about the past, information beyond just the facts. Doing things with this information can only help find the proper place for newspapers and the data they’ve created.

Hadoop isn’t some sort of cure all for the woes we face but I think it gives a glimpse of how a future news organization could use data to do incredible things and give users a much different relationship with the news, one they would renew every day.

Living Behind the Pay Wall

Techdirt has two really good posts today about making information hard-to-find when customers are looking for it.

The first deals directly with it by looking at newspapers holding their archives hostage by putting up a pay wall in front of them after a certain amount of time has passed. This is silly and yes, I know we do it officially but I’ve been fighting that since I started here. It’s one of the main reasons why a few of us created this. It seems pretty easy to me to see the benefit of doing this. We haven’t promoted anything about the article server at all yet people find us thru search engines. It’s really quite simple.

The second post is about Howard Stern and his shrinking influence since his move off of free radio and onto satellite. It’s based on one of our articles.

Overall, you either make your information easily found by users or they will route around you, looking elsewhere and more than likely ignoring you forever. It’s your choice.

Thought of the Day

Would it be such a terrible thing? If newspapers were managed by new groups of people with no real romantic link to the glory days of newspapers, and freed from management grown fat and lazy on the easy profits of the glory days of American local newspapers maybe titles can innovate again and start thinking about how they serve audiences better in print and online rather than arguing about trivial details of the content of dying Monday to Friday newspapers or creating unreadable wrappers for supermarket inserts on Sundays.

[ via ]

Some things I’ve liked recently

Here’s a few things I’ve come across the last few days which are pretty nifty…

Huffington Post Chicago

The Huffington Post Chicago site has launched and so far I’m impressed. Looks like a nice collection of opening day stories talking about the greatness of Chicago.

I found a few I liked, The Newspaper is Dead, Long Live the Newspaper, Blackhawk Down: Chicago’s Forgotten Franchise and Chicago Tribune’s Social Media Evolution.

One interesting thing is the right rail being mainly links to the local bloggers. I think that’s an incredible way of generating goodwill and probably bringing people back again and again. The key is distribution and aggregation. It’s something I try to preach here though it doesn’t always seem like people are listening.

Opening the Archives

From Dan Gillmor

First is to open the archives, with permalinks on every story in the database. Newspapers hold more of their communities’ histories and all other media put together, yet they hoard it behind a paywall that produces pathetic revenues and keeps people in the communities from using it — as they would all the time — as part of their current lives. The revenues would go up with targeted search and keyword-specific ads on those pages, I’m absolutely convinced. But an equally important result would be to strengthen local ties.

Oh, you mean something like this?