What Is Research?

November 16, 2008

Wikipedia — side-effects

In a recent blog post, Nicholas Carr talked about the “centripetal web” — the increasing concentration and dominance of a few sites that seem to suck in links, attention and traffic. Carr says something interesting:

Wikipedia provides a great example of the formative power of the web’s centripetal force. The popular online encyclopedia is less the “sum” of human knowledge (a ridiculous idea to begin with) than the black hole of human knowledge. At heart a vast exercise in cut-and-paste paraphrasing (it explicitly bans original thinking), Wikipedia first sucks in content from other sites, then it sucks in links, then it sucks in search results, then it sucks in readers. One of the untold stories of Wikipedia is the way it has siphoned traffic from small, specialist sites, even though those sites often have better information about the topics they cover. Wikipedia articles have become the default external link for many creators of web content, not because Wikipedia is the best source but because it’s the best-known source and, generally, it’s “good enough.” Wikipedia is the lazy man’s link, and we’re all lazy men, except for those of us who are lazy women.

This is an important and oft-overlooked point: when saying whether something is good or bad, we need to look not just at the benefit it provides, but also at the opportunity cost. In the case of Wikipedia, there is at least some opportunity cost: people seeking those answers may well have gone to the “specialist sites” instead of to Wikipedia.

Of course, it’s possible to argue that specialist sites of the required quality do not exist, but it can again be argued, in a counter-response, that specialist sites would have existed in greater number and greater quality if Wikipedia didn’t exist, or at any rate, if Wikipedia weren’t so much of a default. It might be argued, for instance, that of all the free labor donated to Wikipedia, at least a fraction of it could have gone into developing and improving existing “specialist sites”. As I described in another blog post, the very structure of Wikipedia creates strong disincentives for competition.

Wikipedia, Mathworld and Planetmath

In 2003, at a time when I was in high school and used a dial-up to connect to the Internet, I was delighted to find a wonderful resource called Mathworld. I devoured Mathworld for all the hundreds of triangle centers it contained information on, and I eagerly awaited the expansion of Mathworld in other areas where it didn’t have much content. I was on a dial-up connection, so I saved many of the pages for referencing offline.

Later, in 2004, I discovered Planetmath. It wasn’t as beautifully done as Mathworld (Planetmath relies on a large contributor pool with little editorial control, as opposed to Mathworld, that has a small central team headed by Eric Weisstein that vets every entry before publication). But, perhaps because of less vetting and fewer editing restrictions, Planetmath had entries on many of the topics where Mathworld lacked entries. I found myself using both these resources, and was appreciative of the strengths and weaknesses of both models.

A litte later in the year, I discovered Wikipedia. At the time, Wikipedia was fresh and young — some of the policies such as notability and verifiability had not been formulated in their current form, and many of the issues Wikipedia currently faces were non-existent. Wikipedia’s model was even more skewed towards ease of editing. It didn’t have the production quality looks of Mathworld or the friendly fontfaces of Planetmath, but the page structure and category structure was pretty nice. Yet another addition to my repository, I thought.

Today, Wikipedia stands as one of the most dominant websites (it is ranked 8 in the Alexa rankings, for instance). More important, Wikipedia enjoyed steady growth both in contributions and usage until 2007 (contribution dropped a little in 2008). Planetmath and Mathworld, that fit Nicholas Carr’s description of “specialist sites”, on the other hand, haven’t grown that visibly. They haven’t floundered either — they continue to be at least as good as they were four years ago, and they continue to attract similar amounts of traffic. But there’s this nagging feeling I get that Wikipedia really did steal the thunder — in the absence of Wikipedia, there would have been more contributions to these sites, and more usage of these sites.

The relation between Wikipedia and Planetmath is of particular note. In 2004, Wikipedia wasn’t great when it came to math articles — a lot of expansion needed to be done to make it competitive. Planetmath released all of its articles under the GNU Free Documentation License — the same license as Wikipedia. Basically, this meant that Wikipedia could copy Planetmath articles as long as the Wikipedia article acknowledged the Planetmath article as its source. Not surprisingly, many of the Planetmath articles on topics that Wikipedia didn’t have were copied. Of course, the Planetmath page was linked to, but we know where the subsequent action involved with “developing” the articles happened — Wikipedia.

Interestingly, Wikipedia acknowledged its debt to Planetmath — at some point in time, the donations page of Wikipedia suggested donating to Planetmath, a resource Wikipedia credited for helping it get started with its mathematics articles (I cannot locate this page now, but it is possibly still lying around somewhere). Planetmath, on its part, introduced unobtrusive Google ads in the top left column — an indicator that it is perhaps not receiving enough donations.

Now, most of the mathematics students I meet are aware of Mathworld and Planetmath and look these up when using the Internet — they haven’t given up these resources in favor of Wikipedia. But they, like me, started using the Internet at a time when Wikipedia was not in a position of dominance. Will new generations of Internet users be totally unaware of the existence of specialist sites for mathematics? Will there be no interest in developing and improving such sites, for fear that the existence of an all-encompassing behemothing “encyclopedia” renders such efforts irrelevant? It is hard to say.

(Note: I, for one, am exploring the possibility of new kinds of mathematics reference resources, using the same underlying software that powers Wikipedia (the MediaWiki software). For instance, I’ve started the Group properties wiki).

The link-juice to Wikipedia

As Nick Carr pointed out in his post:

Wikipedia articles have become the default external link for many creators of web content, not because Wikipedia is the best source but because it’s the best-known source and, generally, it’s “good enough.” Wikipedia is the lazy man’s link, and we’re all lazy men, except for those of us who are lazy women.

In other words, Wikipedia isn’t winning its link-juice through the merit of its entries; it is winning links through its prominence and dominance and through people’s laziness or inability to find alternative resources. Link-juice has two consequences. The direct consequence is that the more people link to something, the more it gets found out by human surfers. The indirect consequence is that Google PageRank and other search engine ranking algorithms make intensive use of the link structure of the web, so a large number of incoming links increases the rank of a page. This is a self-reinforcing loop: the more people link to Wikipedia, the higher Wikipedia pages rank in searches, and the higher Wikipedia pages rank in searches, the more likely it is that people using web searches to find linkable resources will link to the Wikipedia article.

To add to this, external links from Wikipedia articles are ignored by search engines, based on Wikipedia’s settings. This is ostensibly a move to avoid spam links, but it makes Wikipedia a sucker of link-juice as far as search engine ranking is concerned.

In addition, the way people link to Wikipedia is also interesting. Often, links to Wikipedia articles do not include, in the anchor text, any information that the link goes to the Wikipedia article. Rather, the anchor text simply gives the article name. This sends the message to readers that the article on wikipedia is the first place to look something up.

Even experienced and respected bloggers do this. For instance, Terence Tao, a former medalist at the International Mathematical Olympiad and a mathematician famous for having settled a conjecture regarding primes in arithmetic progressions, links copiously to Wikipedia in his blog posts. To be fair, he also links to articles on Planetmath, and papers on the ArXiV in cases where these resources offer better information than the Wikipedia article. Nonetheless, the copious linking suggests that it is likely that not every link to a Wikipedia article is based on the Wikipedia article genuinely being the best resource on the web for that content.

What can we do about it?

Ignoring a strong centripetal influence, such as an all-encompassing knowledge source, does not make us less immune to its pull. There is a strong temptation to use Wikipedia as a “first source” for information. To counter this pull, it is important to be both understanding of the causes behind it and critical of its inevitability.

The success of a quick reference resource like Wikipedia stems from many factors, but two noteworthy among them are desire to learn and grow and laziness. Our curiosity/desire to learn leads us to look for new information, and our laziness prevents us from exerting undue effort in that regard. Wikipedia capitalizes on both our curiosity/desire to learn and grow and laziness in its readers (quick and dirty access to lots of stuff immediately), contributors (easy edit-this-page), linkers (satisfying reader curiosity by providing web links, but using Wikipedia instead of others thanks to laziness). Wikipedia is what I call a “pinpoint resource” — something that provides one-stop access to very specific queries over a large range of possibilities very quickly.

For something to complete with Wikipedia, it must cater to these fundamental attributes. It must be quick to use, provide quality information, and encourage exploration without making things too hard. It must be modular and easily pinpointable. This doesn’t necessarily mean that everything should be modular and easily pinpointable — there are other niches that don’t compete with Wikipedia. But to compete for the “quick-and-dirty” vote, a site has to offer at least some of what Wikipedia offers.

Of course, one of the questions that arises naturally at this point is: isn’t Wikipedia “good enough” to satisfy passing curiosities? I agree that there is usually no harm in using Wikipedia — when compared with ignoring one’s curiosity. But I emphatically disagree with the idea that we cannot do better with dealing with the passing curiosities and desires people have to learn new stuff and teach others, than funnel it through Wikipedia. Passing curiosities can form the basis of enduring and useful investigations, and the kind of resource people turn to at first can determine how the initial curiosity develops. For this reason, if Wikipedia is siphoning off attention from specialist sites that do a better job, not just of providing the facts, but of fostering curiosity and inviting exploration, then there is a loss at some level.

Blog at WordPress.com.