November 16, 2008

Wikipedia — side-effects

In a recent blog post, Nicholas Carr talked about the “centripetal web” — the increasing concentration and dominance of a few sites that seem to suck in links, attention and traffic. Carr says something interesting:

Wikipedia provides a great example of the formative power of the web’s centripetal force. The popular online encyclopedia is less the “sum” of human knowledge (a ridiculous idea to begin with) than the black hole of human knowledge. At heart a vast exercise in cut-and-paste paraphrasing (it explicitly bans original thinking), Wikipedia first sucks in content from other sites, then it sucks in links, then it sucks in search results, then it sucks in readers. One of the untold stories of Wikipedia is the way it has siphoned traffic from small, specialist sites, even though those sites often have better information about the topics they cover. Wikipedia articles have become the default external link for many creators of web content, not because Wikipedia is the best source but because it’s the best-known source and, generally, it’s “good enough.” Wikipedia is the lazy man’s link, and we’re all lazy men, except for those of us who are lazy women.

This is an important and oft-overlooked point: when saying whether something is good or bad, we need to look not just at the benefit it provides, but also at the opportunity cost. In the case of Wikipedia, there is at least some opportunity cost: people seeking those answers may well have gone to the “specialist sites” instead of to Wikipedia.

Of course, it’s possible to argue that specialist sites of the required quality do not exist, but it can again be argued, in a counter-response, that specialist sites would have existed in greater number and greater quality if Wikipedia didn’t exist, or at any rate, if Wikipedia weren’t so much of a default. It might be argued, for instance, that of all the free labor donated to Wikipedia, at least a fraction of it could have gone into developing and improving existing “specialist sites”. As I described in another blog post, the very structure of Wikipedia creates strong disincentives for competition.

Wikipedia, Mathworld and Planetmath

In 2003, at a time when I was in high school and used a dial-up to connect to the Internet, I was delighted to find a wonderful resource called Mathworld. I devoured Mathworld for all the hundreds of triangle centers it contained information on, and I eagerly awaited the expansion of Mathworld in other areas where it didn’t have much content. I was on a dial-up connection, so I saved many of the pages for referencing offline.

Later, in 2004, I discovered Planetmath. It wasn’t as beautifully done as Mathworld (Planetmath relies on a large contributor pool with little editorial control, as opposed to Mathworld, that has a small central team headed by Eric Weisstein that vets every entry before publication). But, perhaps because of less vetting and fewer editing restrictions, Planetmath had entries on many of the topics where Mathworld lacked entries. I found myself using both these resources, and was appreciative of the strengths and weaknesses of both models.

A litte later in the year, I discovered Wikipedia. At the time, Wikipedia was fresh and young — some of the policies such as notability and verifiability had not been formulated in their current form, and many of the issues Wikipedia currently faces were non-existent. Wikipedia’s model was even more skewed towards ease of editing. It didn’t have the production quality looks of Mathworld or the friendly fontfaces of Planetmath, but the page structure and category structure was pretty nice. Yet another addition to my repository, I thought.

Today, Wikipedia stands as one of the most dominant websites (it is ranked 8 in the Alexa rankings, for instance). More important, Wikipedia enjoyed steady growth both in contributions and usage until 2007 (contribution dropped a little in 2008). Planetmath and Mathworld, that fit Nicholas Carr’s description of “specialist sites”, on the other hand, haven’t grown that visibly. They haven’t floundered either — they continue to be at least as good as they were four years ago, and they continue to attract similar amounts of traffic. But there’s this nagging feeling I get that Wikipedia really did steal the thunder — in the absence of Wikipedia, there would have been more contributions to these sites, and more usage of these sites.

The relation between Wikipedia and Planetmath is of particular note. In 2004, Wikipedia wasn’t great when it came to math articles — a lot of expansion needed to be done to make it competitive. Planetmath released all of its articles under the GNU Free Documentation License — the same license as Wikipedia. Basically, this meant that Wikipedia could copy Planetmath articles as long as the Wikipedia article acknowledged the Planetmath article as its source. Not surprisingly, many of the Planetmath articles on topics that Wikipedia didn’t have were copied. Of course, the Planetmath page was linked to, but we know where the subsequent action involved with “developing” the articles happened — Wikipedia.

Interestingly, Wikipedia acknowledged its debt to Planetmath — at some point in time, the donations page of Wikipedia suggested donating to Planetmath, a resource Wikipedia credited for helping it get started with its mathematics articles (I cannot locate this page now, but it is possibly still lying around somewhere). Planetmath, on its part, introduced unobtrusive Google ads in the top left column — an indicator that it is perhaps not receiving enough donations.

Now, most of the mathematics students I meet are aware of Mathworld and Planetmath and look these up when using the Internet — they haven’t given up these resources in favor of Wikipedia. But they, like me, started using the Internet at a time when Wikipedia was not in a position of dominance. Will new generations of Internet users be totally unaware of the existence of specialist sites for mathematics? Will there be no interest in developing and improving such sites, for fear that the existence of an all-encompassing behemothing “encyclopedia” renders such efforts irrelevant? It is hard to say.

(Note: I, for one, am exploring the possibility of new kinds of mathematics reference resources, using the same underlying software that powers Wikipedia (the MediaWiki software). For instance, I’ve started the Group properties wiki).

The link-juice to Wikipedia

As Nick Carr pointed out in his post:

Wikipedia articles have become the default external link for many creators of web content, not because Wikipedia is the best source but because it’s the best-known source and, generally, it’s “good enough.” Wikipedia is the lazy man’s link, and we’re all lazy men, except for those of us who are lazy women.

In other words, Wikipedia isn’t winning its link-juice through the merit of its entries; it is winning links through its prominence and dominance and through people’s laziness or inability to find alternative resources. Link-juice has two consequences. The direct consequence is that the more people link to something, the more it gets found out by human surfers. The indirect consequence is that Google PageRank and other search engine ranking algorithms make intensive use of the link structure of the web, so a large number of incoming links increases the rank of a page. This is a self-reinforcing loop: the more people link to Wikipedia, the higher Wikipedia pages rank in searches, and the higher Wikipedia pages rank in searches, the more likely it is that people using web searches to find linkable resources will link to the Wikipedia article.

To add to this, external links from Wikipedia articles are ignored by search engines, based on Wikipedia’s settings. This is ostensibly a move to avoid spam links, but it makes Wikipedia a sucker of link-juice as far as search engine ranking is concerned.

In addition, the way people link to Wikipedia is also interesting. Often, links to Wikipedia articles do not include, in the anchor text, any information that the link goes to the Wikipedia article. Rather, the anchor text simply gives the article name. This sends the message to readers that the article on wikipedia is the first place to look something up.

Even experienced and respected bloggers do this. For instance, Terence Tao, a former medalist at the International Mathematical Olympiad and a mathematician famous for having settled a conjecture regarding primes in arithmetic progressions, links copiously to Wikipedia in his blog posts. To be fair, he also links to articles on Planetmath, and papers on the ArXiV in cases where these resources offer better information than the Wikipedia article. Nonetheless, the copious linking suggests that it is likely that not every link to a Wikipedia article is based on the Wikipedia article genuinely being the best resource on the web for that content.

What can we do about it?

Ignoring a strong centripetal influence, such as an all-encompassing knowledge source, does not make us less immune to its pull. There is a strong temptation to use Wikipedia as a “first source” for information. To counter this pull, it is important to be both understanding of the causes behind it and critical of its inevitability.

The success of a quick reference resource like Wikipedia stems from many factors, but two noteworthy among them are desire to learn and grow and laziness. Our curiosity/desire to learn leads us to look for new information, and our laziness prevents us from exerting undue effort in that regard. Wikipedia capitalizes on both our curiosity/desire to learn and grow and laziness in its readers (quick and dirty access to lots of stuff immediately), contributors (easy edit-this-page), linkers (satisfying reader curiosity by providing web links, but using Wikipedia instead of others thanks to laziness). Wikipedia is what I call a “pinpoint resource” — something that provides one-stop access to very specific queries over a large range of possibilities very quickly.

For something to complete with Wikipedia, it must cater to these fundamental attributes. It must be quick to use, provide quality information, and encourage exploration without making things too hard. It must be modular and easily pinpointable. This doesn’t necessarily mean that everything should be modular and easily pinpointable — there are other niches that don’t compete with Wikipedia. But to compete for the “quick-and-dirty” vote, a site has to offer at least some of what Wikipedia offers.

Of course, one of the questions that arises naturally at this point is: isn’t Wikipedia “good enough” to satisfy passing curiosities? I agree that there is usually no harm in using Wikipedia — when compared with ignoring one’s curiosity. But I emphatically disagree with the idea that we cannot do better with dealing with the passing curiosities and desires people have to learn new stuff and teach others, than funnel it through Wikipedia. Passing curiosities can form the basis of enduring and useful investigations, and the kind of resource people turn to at first can determine how the initial curiosity develops. For this reason, if Wikipedia is siphoning off attention from specialist sites that do a better job, not just of providing the facts, but of fostering curiosity and inviting exploration, then there is a loss at some level.



  1. You seem to think a lot like me, the difference is I don’t write it down. I’ve been thinking about something like your groupprops wiki for a long time (although it never occurred it could be done successfully with mediawiki). Good work.

    Comment by bens — January 18, 2009 @ 8:00 pm

  2. The central problem I think is :
    A] We like to have a one-stop shop for all our questions
    B] We would like it to be comprehensive

    Wikipedia satisfies A] but not B]. Mathworld and PlanetMath satisfy B] but not A]. A truly good solution would be something like a federated system, wherein there is a centrally located query engine which pulls contents from specialist sites like Mathworld and PlanetMath for display. Such a system should potentially satisfy both A] and B] and I am sure we can work out a donation/revenue model to support both the specialist sites and central federation engine. However, this is a really really hard problem. Starting such a system even for a few areas like mathematics and physics might be a great start.

    Comment by Arun — January 27, 2009 @ 10:44 am

  3. Hi Arun,

    I’m indeed working on such a system of wikis for mathematical topics. I’ve been at it for about two years, but it is still in its very initial stages. You can check it out at:


    The goal of the Reference Guide is to be the one-stop shop where you can come with any query and it’ll give a dictionary-like meaning along with links to detailed wikis in specific subjects. For instance, you can look at the page:


    for a sample of how I’m hoping it’ll eventually work out.

    Comment by vipulnaik — January 29, 2009 @ 3:40 am

  5. @Vipul,
    Hi! How is the data being generated for that subwiki site?

    Comment by Arun — January 30, 2009 @ 9:00 am

  6. @Vipul,
    Just for clarification, I had something like [1] in my mind.

    [1] http://en.wikipedia.org/wiki/Federated_search

    Comment by Arun — January 30, 2009 @ 9:03 am

  7. Can you give a few examples where (say) Tao linked to Wikipedia but the Wikipedia article was not actually good? 🙂

    [I read a comment somewhere else noting that almost everything Terence Tao linked to on Wikipedia was a good article, while most mathematics articles on Wikipedia weren’t very good… the simple explanation is that for the linked topics, Wikipedia was indeed a good resource, which is why it was linked to.]

    Comment by Shreevatsa — February 3, 2009 @ 9:59 pm

  8. Hi Shreevatsa,

    I am not sure what “actually good” means here. Do you mean, good relative to other Wikipedia articles, or good relative to other online references, or good relative to all existing references?

    It did seem to me that Tao actually checks the Wikipedia entry before linking to it. I think this is evidenced by the fact that in some cases, he links to the Planetmath entries, or other entries, or puts excerpts from books.

    To check up on your comment, I decided to look at some Wikipedia entries linked to by Terence Tao from what’s currently his latest blog post (245B, Notes 9). He links to Wikipedia articles on Lebesgue differentiation theorem, complete metric space, dense set, nowhere dense set, Baire category theorem, and many more. Anyway, so the Wikipedia entry on Lebesgue differentiation theorem seems reasonable in itself, and fits in reasonably well with what Tao seems to want for his work. Good one. As for complete metric space — well, here the Wikipedia entry probably does a reasonable job by itself, though it seems to not fill in with Tao’s flow too well. The entry on nowhere dense set is short and not outstanding, but does a reasonable job again for Tao’s flow. Ditto for dense set: the Wikipedia page isn’t what I’d consider great, but it is okay for Tao’s purposes. Baire category theorem on Wikipedia has a lot of useful information, but I’m not too fond of the way it has been presented.

    So anyway, I looked up the corresponding definitions on Mathworld. The Mathworld entries for dense and nowhere dense seem shorter, but to me they seem to be better presented and more useful to link to for serious math students. But Mathworld definitely doesn’t win hands down and it is possible that Tao made a conscious choice of the Wikipedia entry after comparing with the Mathworld entry.

    A little later down, I see Terence Tao linking to tempered distribution: here at last is some evidence that perhaps Tao isn’t inspecting the outward Wikipedia links. Tao’s link lands up at a disambiguation page, that was last edited in July 2008. Another link that seems extremely fishy is the link to harmonic analysis, where reading the Wikipedia entry doesn’t seem to help directly with understanding the material. On the other hand, when it comes to the Fefferman-Stein decomposition theorem, Tao links to the paper’s MathSciNet page rather than to a (non-existent) Wikipedia entry, and Tao does point his link to Radon-Nikodym derivative when mentioning the Radon-Nikodym theorem. Some of Tao’s later Wikipedia links are also unhelpful to people trying to grasp the material first-off, but may be value addition to people who have understood bulk of the material.

    All in all, I don’t know how Tao is choosing Wikipedia entries — whether he is comparing them with other online references, whether he is reading the pages, whether there are many other things for which he chose not to link because of the poor quality of the entry. But the few examples I gave above show that Tao isn’t inspecting each entry very critically — and I’d be surprised if he did, given the number of links he’s given. Whatever the case, readers of his blog (other than me) do seem to get the impression of link-by-default to Wikipedia: for instance, look at the comments section of this blog post where an anonymous commenter suggests a convention indicating links to Wikipedia for all underlined terms in the “print” version of the blog.

    Comment by vipulnaik — February 5, 2009 @ 3:04 am

  9. Hi Shreevatsa,

    For more about link-by-default, the way I would handle it, if indeed I planned to link to Wikipedia, is to write something like:

    Complete metric space(Wikipedia entry)

    If there exist good entries at multiple sources, I’d comma-separate them, e.g.,

    Complete metric space(Wikipedia entry, Mathworld entry)

    On the plus side, it reduces the canonicity or “by-default” appearance. On the minus side, it consumes more space.

    Comment by vipulnaik — February 5, 2009 @ 3:08 am

  10. I think you’re reading too much “canonicity” into links which does not exist. People link randomly to adjectives, adverbs, whatever they like; even things almost unrelated to the anchor text. The way many people follow links is that they look at the URL in the status bar, see where it’s a link to (and whether it’s a PDF, etc. :)), then follow the link if they feel like it. It is redundant to say “Wikipedia” or “Mathworld” in the anchor text when it’s visible in the link.
    The whole argument is like saying that people shouldn’t link to books on Amazon because there are other book sources and Amazon is not the only one. 🙂 It is just one convenient source of information the author chose; readers can ignore it or search somewhere else if they want.

    Comment by Shreevatsa — February 20, 2009 @ 1:25 am

  11. Hi Shreevatsa,

    Actually, I don’t see anything ludicrous about the Amazon example — that seems to be another example of canonicity. And the whole point of what I’m saying about canonicity is that although people link randomly, the distribution of how they link is far from uniform :). Rather, random linking biases usually reinforce links to the more easy-to-find, already “canonical” resources, thus making them even more canonical.

    As for people hovering over links to read where they point and then determining whether to go there or not — well, not everybody does that :). But even if they do, the point I’m trying to make is that when people just link up to a default source such as Amazon or Wikipedia, without it being clear whether it is linked to simply on account of it being a default or on account of their finding it good in the specific instance, some information on how the link choice was made is lost. And this can lead to both humans and bots concluding that the particular specific instance (e.g., the specific Wikipedia entry) has been “recommended” by the linker.

    Comment by vipulnaik — February 20, 2009 @ 2:07 am

