Mahalo, Part III – Opentheory.net

Mahalo, as I noted in my launch coverage back in May, is “a search engine where users get hand-crafted portal-like results for common search queries.” Mahalo’s founder Jason Calacanis and I are both connected to Larry Sanger, so between that and being fairly intrigued by the idea I’ve kept my eye on the project.

I like Mahalo. It gives me good content for certain searches, and gosh darnit, it’s friendly. But my major philosophical problem with Mahalo, which I laid out here, is that it’s not really on the correct side of the technology curve. Having humans make portals for common search terms may work well compared to other options today, but algorithm-based search (e.g., Google) will keep getting better and better in almost every metric whereas human-based search can’t*. Google is pretty much guaranteed to improve over time, because it can benefit from iterative algorithm tweaks, large and small, but Mahalo is based on non-iterative human labor and so has to work for every inch of improvement on every page (and simultaneously spend many man-hours battling page rot). I’m not going doom-and-gloom on Mahalo, because I think they’ve got some really good stuff going on, but let’s be honest– it’s a tough spot to be in.

*Granted, I expect Mahalo to do cool things with crowdsourcing additional links on topics and getting editors to filter and sort the best submissions. This sort of strategy has worked for Slashdot– but it’s an unproven technology in search. In some fashion I think Mahalo will be able to crowdsource freshness, but it may not be enough to keep every page as fresh as one would like.

Essentially, I believe Mahalo needs something up its sleeve to stay competitive in the future, something that can connect it to the world of the algorithm. History seems to bear this out: Yahoo’s human-crafted web directory looked as good on paper then as Mahalo’s human-crafted search portal strategy does now— but Google’s algorithms ate Yahoo’s lunch. Similarly, the human-powered DMOZ just couldn’t compete against Google. Google’s algorithmic approach allows it to have a “special sauce” of algorithms, if/then context switches, and exceptions that it can apply to almost every page on the internet, tweak at will, and combine with sophisticated intent predictors drawn from users’ searches to get pretty great search results. This is a very powerful, iterative strategy, and it allows Google to keep its results fresh without any human overhead. I would generalize this as ‘Technological progress in search is relentless and algorithms are impressively tweakable, scalable, and improvable compared to human evaluation, and Google is very good at using these these facts to eat peoples’ lunches.’

Obviously I don’t have The Solution for how Mahalo should try to answer Google. I do think Mahalo is right to try something new, since nobody’s had any luck trying to out-Google Google, and I also think Mahalo’s on to something with its “search portal” idea. But I can’t get away from the feeling that turning one’s back on algorithms is trying to swim against the current.

So here’s my two-part idea: (1) build a back-end to allow Mahalo to automatically import tagged content into its search portals and (2). use algorithms to figure out what tagged content is worth bringing in.

(1) The tagged-content import system: this setup would draw in relevant, tagged content from elsewhere (like Flickr and Youtube, but blogs, too– the sky’s the limit) to automatically populate e.g., a “content grab-bag” section of Mahalo portals, perhaps heavily weighted toward images, with thumbnails. If someone’s at the Mahalo portal for Fugu, most content tagged Fugu will be quite relevant, and there are often powerful indications of how good tagged content is (community ratings, views, links, etc) which could give sorting methods plenty of fodder to work with. Basically, there’s lots of great tagged content of all stripes out there, tags enable more streamlined and fresher aggregation, and content aggregation is what Mahalo is all about. It seems like a pretty natural fit.

(2) Using algorithms (not editors) to power the system: by using algorithms, adding this sort of content doesn’t add extra effort per page and it also automatically stays fresh. I think it’s the perfect complement to the human-crafted search results and would help Mahalo portals stay dynamic. Basically, it’s the perfect toehold for introducing and benefiting from algorithms in Mahalo. That said, you could certainly use the tagged content import system with editors making the calls instead of algorithms… it’d just be a lot more man-hours and it wouldn’t stay as fresh. And there might not be as much of a point, since the editor could go out, find, and add such content independently. What might be the most attractive option is a hybrid algorithm-editor solution, where algorithms automatically populate sections of Mahalo with tagged content once per day, but editors go through and veto anything that doesn’t fit. Algorithms could be tweaked to try to give more ‘relevant/informative’ content or more ‘fun’ content, depending on the topic (this might not be in the first system revision!).

Eventually, Mahalo could branch out to become a hub which aggregates all sorts of tagged content on a given topic and displays them based on algorithm, user, and/or editor input… or people could submit various types of content and each Mahalo page could be somewhere between a Digg/Slashdot hybrid for almost any type of content on that specific topic (wouldn’t that be cool?). But that’s getting a little far afield. In the short-term, I think the combination of human-crafted results and being able to dynamically populate search portals with fun & relevant content would serve Mahalo well. It’s a combination that Google can’t have, and it’d also give Mahalo a leg up on Wikipedia as a portal to the freshest and most diverse content on a topic.

A tagging back-end would also help integrate some local search capabilities into Mahalo: Mahalo could determine a user’s location through various means (whether their search phrase included a location, by tracing their IP, by user preferences, or with a “where are you?” input box), and then return relevant content that was also tagged with their location (and synonyms).

Lest I get ahead of myself, this idea isn’t an automatic homerun. It’s new and untested- I don’t know of anybody who’s done this sort of content aggregation based on tags. Largely, this idea would depend on figuring out clever ways to source and sort through tagged content (hey, if you didn’t want to roll your own algorithms right away maybe you could somehow use a Google appliance to help sort tagged content!), and the proof would be in the relevance of results. It might be a homerun and it might be a dud. But given the potential benefits (especially in freshness) and near-effortless scalability to all Mahalo pages, it sure seems worth testing.

Bottom line: Mahalo’s pretty darn cool as-is (as a Minnesotan, I like how friendly it seems), and I don’t doubt that Jason has some further ideas up his sleeve. But using tags and algorithms to aggregate content, though no magic bullet, seems like a promising direction which would allow Mahalo to effortlessly go local, keep pages fresh, and start to take advantage of algorithms while still keeping true to its core vision of being “human-powered search”.

Edit, 7-13-07: It seems to me that a good strategy for Mahalo is fostering communities around its portal pages. Communities like dynamic content, thus my suggestion of dynamically bringing in tagged content (it’s an inexpensive way to keep things fresh). Per a previous email, I’d also suggest setting up some sort of ‘alert’ system, where Mahalo sets up automatic Google searches for all the terms they’ve done, and if any sites that weren’t originally in (for instance) the Google top 10 break into the Google top 5, that entry gets flagged as “there’s something happening on this search term that might make our portal stale — have a guide check it out”.