What Is Research?

May 28, 2010

Planetmath and Mathworld losing out to Wikipedia

Filed under: Web structure,Wikipedia,Wikis — vipulnaik @ 10:59 pm
Tags: ,

As I’ve mentioned earlier (such as here), it has seemed to me for some time that both Planetmath and Mathworld are losing out as Internet-based mathematical references to Wikipedia. I don’t expect that there has been absolute decline in the traffic to these websites — the growth of the Internet by leaps and bounds would mean that a website has to make a fairly active effort to lose users. But I do expect that more and more of the new users who are coming in are treating Wikipedia more as a first source of information.

From the preliminary results of a recent SurveyMonkey survey: out of 29 respondents (most of them people pursuing or qualified in mathematics), 24 claimed to have used Wikipedia often, and the remaining 5 to have used it occasionally for mathematical reference information. For Planetmath, 22 people claimed to have used it occasionally, and 7 to have heard of it but not used it. For Mathworld, 5 people claimed to have used it often, 16 to have used it occasionally, 7 to have heard of it but not used it, and 1 to have never heard of it.

In response to a question on how Planetmath and Mathworld compare with Wikipedia, the nine free responses were:

  • Planetmath now loads too slowly, so I seldom use it. Even when I did use it often, I found that the articles tended either to be too incomplete, or too tailored to people who are already experts. Many Mathworld pages are still much better than many Wikipedia pages, but Wikipedia is more comprehensive.

  • planet math is slow!!! mathworld has more graphics and less maths.

  • Material is generally better presented on Mathworld, but the topics are more limited.

  • Not as much information as Wikipedia and not as well arranged

  • I usually find Wikipedia to be my first source and only go to Mathworld or Planetmath if wikipedia fails me. I guess that means Wikipedia is better.

  • Planetmath is dying, Mathworld is static.

  • I find wikipedia much more useful than Mathworld. Mathworld’s pages are very technical, which is not what I am looking for on the internet usually. Usually I am looking for someone’s nice conceptual understanding of a topic or definition (through nice examples), and Wikipedia usually has lots of these.

  • Planetmath is quite useful to find proofs. Mathworld is very specialised, but it has a few nice bits of information sometimes. They seem to both be quite stagnant compared to Wikipedia.

  • They can’t keep up. There was probably a time when PlanetMath was a better reference than Wikipedia, but it’s fading fast. I think their ownership model isn’t conducive to long term quality.

(See the responses to these and other questions here and take the survey here).

The general consensus does seem to confirm my suspicions. Why is Wikipedia gaining? Here are the broad classes of explanations:

  1. It’s just a self-reinforcing process. The more people hear about and link to Wikipedia, the more people are likely to read it, the more people are likely to edit it and improve it. But if that’s the case, why did Wikipedia ever get ahead of Mathworld and Planetmath? Two reasons: (i) its more radically open editing rules (ii) Wikipedia covers many areas other than mathematics, so people come to the site more in general. Also, since it covers many areas other than mathematics, it can better cover content straddling mathematics and other areas, such as biographies of mathematicians, and historical information that is relevant to mathematics. This creates a larger, strongly internally linked, repository of information.

  2. Planetmath’s owner-centric model (as mentioned in one of the responses) where each entry is owned by one person, does not create a conducive environment for the gradual growth and improvement of entries.

  3. The appearance of content is better on Wikipedia. Prettier symbols, faster loading, better internal links, better search. This is definitely an advantage over Planetmath, which has slow load times in the experience of many users (as indicated by the comments above) though perhaps not so much over Mathworld.

  4. Google weights Wikipedia higher (because of the larger size of the website and the fact that a lot of people link to Wikipedia). This is related to (1).

  5. The people in charge of Mathworld and Planetmath simply lost interest. Mathworld is largely run by Eric Weisstein, an employee at Wolfram, who seems to have recently been trying to integrate metadata about mathematical theorems and conjectures into Wolfram Alpha. Developing Mathworld continually to a point of excellence does not seem to have been a top priorty for Weisstein or his employer Wolfram Research (that hosts Mathworld) over the last few years. The people running Planetmath also may have become less interested in continually innovating.

Given all this, is Wikipedia the best in terms of: (i) the current product or (ii) the process of arriving at the product? While I’m far from a Wikipedia evangelist, I think that the answer to (ii) is roughly yes if you’re thinking of broad appeal. Anything which beats Wikipedia will probably do so by being more narrowly focused, but it may then not be of much interest to people outside that domain. A host of many such different niche references may together beat out Wikipedia for people who care enough to learn about a multiplicity of references. For those who just want one reference website, Wikipedia will continue to be the place of choice in the near future (i.e., the next 3-4 years at least, in my opinion).

Currently, Wikipedia is an uneasy mix of precise technical information and motivational paragraphs. It makes little use of metadata to organize its information; on the other hand, it is easy to edit and join in. The mathematics entries cannot be radically changed in a way that would make them radically different in appearance from the articles on the rest of the site. This opens up many niche possibilities, some of which are being explored:

  1. Lab notebooks, where people store a bunch of thoughts about a topic, without attempts to organize them into something very coherent. Here, good metadata and tagging conventions could allow these random lab notebook-type jottings to cohere into an easily accessible reference. This would be the mathematical version of open notebook science, a practice that is slowly spreading in some of the experimental sciences. nLab (the n-category lab) is one example of a “lab notebook” in the mathematical context. This is great for motivation, and also for understanding the minds of mathematicians and the process of mathematical reasoning.

  2. Something that focuses on a particular aspect of mathematical activity. For instance, Tricki, called the Tricks Wiki, focuses on tricks. Other references may focus on formulas, others may focus on counterexamples, yet others (such as the AIMath wiki on localization techniques and equivariant cohomology) may focus simply on providing extensive bibliographies. Somewhat more developed examples include the Dispersive Wiki and complexity zoo (actually, a computer science topic, but similar in nature to a lot of mathematics). Some may focus on exotic tricks of relevance to a particular mathematical discipline. There is some cross-over with lab notebooks, as the tricks become more and more exotic and the writing becomes more and more spontaneous and less subject to organization into an article.

  3. Highly structured content rich in metadata that is intended to provide definitions, proofs and clarify analogies/relations. Examples include the Group Properties Wiki [DISCLOSURE: I started it and am the primary contributor] which concentrates on group theory. The flip side is that the high degree of organization uses subject-specific structures and hence must be concentrated on a particular narrow subject.

There are probably many other niches waiting to be filled. And there may also be close susbstitutes for reference sites that weren’t created as references. For instance, Math Overflow, though not a reference site, may play the role of a reference site once it accumulates a huge number of questions and answers and adopts better search and specific tagging capabilities. Similarly, thirty years from now, the contents of Terry Tao’s weblog may contain a bit on virtually every mathematical topic, in the same way as Marginal Revolution have a bit on almost all basic economic topics (I say “thirty years” because economics is in many ways a smaller subject than mathematics).

November 16, 2008

Wikipedia — side-effects

In a recent blog post, Nicholas Carr talked about the “centripetal web” — the increasing concentration and dominance of a few sites that seem to suck in links, attention and traffic. Carr says something interesting:

Wikipedia provides a great example of the formative power of the web’s centripetal force. The popular online encyclopedia is less the “sum” of human knowledge (a ridiculous idea to begin with) than the black hole of human knowledge. At heart a vast exercise in cut-and-paste paraphrasing (it explicitly bans original thinking), Wikipedia first sucks in content from other sites, then it sucks in links, then it sucks in search results, then it sucks in readers. One of the untold stories of Wikipedia is the way it has siphoned traffic from small, specialist sites, even though those sites often have better information about the topics they cover. Wikipedia articles have become the default external link for many creators of web content, not because Wikipedia is the best source but because it’s the best-known source and, generally, it’s “good enough.” Wikipedia is the lazy man’s link, and we’re all lazy men, except for those of us who are lazy women.

This is an important and oft-overlooked point: when saying whether something is good or bad, we need to look not just at the benefit it provides, but also at the opportunity cost. In the case of Wikipedia, there is at least some opportunity cost: people seeking those answers may well have gone to the “specialist sites” instead of to Wikipedia.

Of course, it’s possible to argue that specialist sites of the required quality do not exist, but it can again be argued, in a counter-response, that specialist sites would have existed in greater number and greater quality if Wikipedia didn’t exist, or at any rate, if Wikipedia weren’t so much of a default. It might be argued, for instance, that of all the free labor donated to Wikipedia, at least a fraction of it could have gone into developing and improving existing “specialist sites”. As I described in another blog post, the very structure of Wikipedia creates strong disincentives for competition.

Wikipedia, Mathworld and Planetmath

In 2003, at a time when I was in high school and used a dial-up to connect to the Internet, I was delighted to find a wonderful resource called Mathworld. I devoured Mathworld for all the hundreds of triangle centers it contained information on, and I eagerly awaited the expansion of Mathworld in other areas where it didn’t have much content. I was on a dial-up connection, so I saved many of the pages for referencing offline.

Later, in 2004, I discovered Planetmath. It wasn’t as beautifully done as Mathworld (Planetmath relies on a large contributor pool with little editorial control, as opposed to Mathworld, that has a small central team headed by Eric Weisstein that vets every entry before publication). But, perhaps because of less vetting and fewer editing restrictions, Planetmath had entries on many of the topics where Mathworld lacked entries. I found myself using both these resources, and was appreciative of the strengths and weaknesses of both models.

A litte later in the year, I discovered Wikipedia. At the time, Wikipedia was fresh and young — some of the policies such as notability and verifiability had not been formulated in their current form, and many of the issues Wikipedia currently faces were non-existent. Wikipedia’s model was even more skewed towards ease of editing. It didn’t have the production quality looks of Mathworld or the friendly fontfaces of Planetmath, but the page structure and category structure was pretty nice. Yet another addition to my repository, I thought.

Today, Wikipedia stands as one of the most dominant websites (it is ranked 8 in the Alexa rankings, for instance). More important, Wikipedia enjoyed steady growth both in contributions and usage until 2007 (contribution dropped a little in 2008). Planetmath and Mathworld, that fit Nicholas Carr’s description of “specialist sites”, on the other hand, haven’t grown that visibly. They haven’t floundered either — they continue to be at least as good as they were four years ago, and they continue to attract similar amounts of traffic. But there’s this nagging feeling I get that Wikipedia really did steal the thunder — in the absence of Wikipedia, there would have been more contributions to these sites, and more usage of these sites.

The relation between Wikipedia and Planetmath is of particular note. In 2004, Wikipedia wasn’t great when it came to math articles — a lot of expansion needed to be done to make it competitive. Planetmath released all of its articles under the GNU Free Documentation License — the same license as Wikipedia. Basically, this meant that Wikipedia could copy Planetmath articles as long as the Wikipedia article acknowledged the Planetmath article as its source. Not surprisingly, many of the Planetmath articles on topics that Wikipedia didn’t have were copied. Of course, the Planetmath page was linked to, but we know where the subsequent action involved with “developing” the articles happened — Wikipedia.

Interestingly, Wikipedia acknowledged its debt to Planetmath — at some point in time, the donations page of Wikipedia suggested donating to Planetmath, a resource Wikipedia credited for helping it get started with its mathematics articles (I cannot locate this page now, but it is possibly still lying around somewhere). Planetmath, on its part, introduced unobtrusive Google ads in the top left column — an indicator that it is perhaps not receiving enough donations.

Now, most of the mathematics students I meet are aware of Mathworld and Planetmath and look these up when using the Internet — they haven’t given up these resources in favor of Wikipedia. But they, like me, started using the Internet at a time when Wikipedia was not in a position of dominance. Will new generations of Internet users be totally unaware of the existence of specialist sites for mathematics? Will there be no interest in developing and improving such sites, for fear that the existence of an all-encompassing behemothing “encyclopedia” renders such efforts irrelevant? It is hard to say.

(Note: I, for one, am exploring the possibility of new kinds of mathematics reference resources, using the same underlying software that powers Wikipedia (the MediaWiki software). For instance, I’ve started the Group properties wiki).

The link-juice to Wikipedia

As Nick Carr pointed out in his post:

Wikipedia articles have become the default external link for many creators of web content, not because Wikipedia is the best source but because it’s the best-known source and, generally, it’s “good enough.” Wikipedia is the lazy man’s link, and we’re all lazy men, except for those of us who are lazy women.

In other words, Wikipedia isn’t winning its link-juice through the merit of its entries; it is winning links through its prominence and dominance and through people’s laziness or inability to find alternative resources. Link-juice has two consequences. The direct consequence is that the more people link to something, the more it gets found out by human surfers. The indirect consequence is that Google PageRank and other search engine ranking algorithms make intensive use of the link structure of the web, so a large number of incoming links increases the rank of a page. This is a self-reinforcing loop: the more people link to Wikipedia, the higher Wikipedia pages rank in searches, and the higher Wikipedia pages rank in searches, the more likely it is that people using web searches to find linkable resources will link to the Wikipedia article.

To add to this, external links from Wikipedia articles are ignored by search engines, based on Wikipedia’s settings. This is ostensibly a move to avoid spam links, but it makes Wikipedia a sucker of link-juice as far as search engine ranking is concerned.

In addition, the way people link to Wikipedia is also interesting. Often, links to Wikipedia articles do not include, in the anchor text, any information that the link goes to the Wikipedia article. Rather, the anchor text simply gives the article name. This sends the message to readers that the article on wikipedia is the first place to look something up.

Even experienced and respected bloggers do this. For instance, Terence Tao, a former medalist at the International Mathematical Olympiad and a mathematician famous for having settled a conjecture regarding primes in arithmetic progressions, links copiously to Wikipedia in his blog posts. To be fair, he also links to articles on Planetmath, and papers on the ArXiV in cases where these resources offer better information than the Wikipedia article. Nonetheless, the copious linking suggests that it is likely that not every link to a Wikipedia article is based on the Wikipedia article genuinely being the best resource on the web for that content.

What can we do about it?

Ignoring a strong centripetal influence, such as an all-encompassing knowledge source, does not make us less immune to its pull. There is a strong temptation to use Wikipedia as a “first source” for information. To counter this pull, it is important to be both understanding of the causes behind it and critical of its inevitability.

The success of a quick reference resource like Wikipedia stems from many factors, but two noteworthy among them are desire to learn and grow and laziness. Our curiosity/desire to learn leads us to look for new information, and our laziness prevents us from exerting undue effort in that regard. Wikipedia capitalizes on both our curiosity/desire to learn and grow and laziness in its readers (quick and dirty access to lots of stuff immediately), contributors (easy edit-this-page), linkers (satisfying reader curiosity by providing web links, but using Wikipedia instead of others thanks to laziness). Wikipedia is what I call a “pinpoint resource” — something that provides one-stop access to very specific queries over a large range of possibilities very quickly.

For something to complete with Wikipedia, it must cater to these fundamental attributes. It must be quick to use, provide quality information, and encourage exploration without making things too hard. It must be modular and easily pinpointable. This doesn’t necessarily mean that everything should be modular and easily pinpointable — there are other niches that don’t compete with Wikipedia. But to compete for the “quick-and-dirty” vote, a site has to offer at least some of what Wikipedia offers.

Of course, one of the questions that arises naturally at this point is: isn’t Wikipedia “good enough” to satisfy passing curiosities? I agree that there is usually no harm in using Wikipedia — when compared with ignoring one’s curiosity. But I emphatically disagree with the idea that we cannot do better with dealing with the passing curiosities and desires people have to learn new stuff and teach others, than funnel it through Wikipedia. Passing curiosities can form the basis of enduring and useful investigations, and the kind of resource people turn to at first can determine how the initial curiosity develops. For this reason, if Wikipedia is siphoning off attention from specialist sites that do a better job, not just of providing the facts, but of fostering curiosity and inviting exploration, then there is a loss at some level.

Blog at WordPress.com.