July 29, 2009

Information costs and open access

Filed under: Uncategorized — vipulnaik @ 4:59 pm
In recent years, there has been a growing trend towards “open access” among librarians and academics. For instance, the University of Michigan recently held an Open Access Week, where they describe open access as:

free, permanent, full-text, online access to peer-reviewed scientific and scholarly material.

In an earlier blog post, I discussed some issues related to open access. Here, I attempt to look at the matter more comprehensively.

Rationales for open access

There are many rationales for open access. The simplest rationale is that open access means reduced cost of access to information, which allows more people to use the same research. Since the marginal cost of making research available via the Internet to more people is near zero, it makes sense from the point of view of efficiency to price access by yet another person to the research at zero.

Another rationale is a more romantic one: making scientific and scholarly publishing available openly allowsfor a free flow of ideas and a grander “conversation”. Support for this rationale also indicates that open access should be more than just free (in the sense of zero cost) access to materials, but also a license that permits liberal reuse of research materials in new contexts. Academia already has strong traditions of quoting from, linking to, and building upon, past work, but this form of open access seeks to provide a legal framework that explicitly specifies reuse rights that go beyond the traditional copyright framework of countries such as the United States. An example of such permissive licensing is the Creative Commons licenses.

As shorthand for these two rationales, I shall use cost rationale and conversation rationale.

Open access policies/mandates

One of the major problems the open access movement has faced so far is getting people to publish papers in open access journals. As long as the best papers continue to be published in closed-access journals, academics who want to read these journals will pressure their university libraries to subscribe to these journals, even when the journals overcharge. Thus, librarians are unable to push open-access terms on publishers. (more…)

July 8, 2009

About “topic pages”: newspapers and others

Filed under: Web structure — vipulnaik @ 12:29 am

In a previous post mainly about the Tricki, I conjectured that in order to make information easiest to access, the Tricki should seriously consider the use of topic pages, that serve as central hubs from which users of the Tricki can get an idea of the Tricki’s coverage on a particular topic. I argued that, even in the presence of a website such as Wikipedia that has a high-quality article on the topic (though this is a moot point according to me) such a page serves a useful local function. I also pointed to the example of New York Times Topic pages. See, for instance, the Times Topic pages on Wikipedia, Twitter, and The Amazon Kindle.

Apparently, as I learned from this Register article, Google, in a testimony by its employee Marissa Mayer (PDF), said that:

Consider instead how the authoritativeness of news articles might grow if an evolving story were published under a permanent, single URL as a living, changing, updating entity. We see this practice today in Wikipedia’s entries and in the topic pages at NYTimes.com. The result is a single authoritative page with a consistent reference point that gains clout and a following of users over time.

To paraphrase, I think Mayer’s point is that individual news stories come and go, they are hard to search, find, and catalogue, but a single evolving story can gain clout over time as people refer to it and know it’ll be there. Apparently, another blogger saw a connection between the issue of topic pages for journalists and my suggestion to have topic pages on the Tricki.

Two kinds of users

I have no knowledge about building newspaper websites; in fact, I have extremely limited knowledge about building websites. So what I’m saying here should be treated largely as amateur speculation.

I broadly see two reasons why people visit sites on the Internet. One is to get the new, the latest, information, wherein they don’t know exactly what they’re looking for but want new information. For instance, when I type in a newspaper or periodical website into my browser URL, I’m not looking for specific information, but rather, looking for news. At the other extreme are situations where I am looking for specific information. For instance, I want to know whether there exists an infinite group where every element has finite order.

Of course, much web use lies somewhere in between. For instance, when I follow a subject-specific blog, such as this or this or this, then what I’m looking for is “new” information but restricted to something specific rather than the general wide swath of things that could happen to the world. Something similar may be said for social networking, in so far as the information-gathering there is related to updates and new information but limited to friends and others in one’s network.

On the Internet, these two forms of exploration serve each other. As I read a news article in the New York Times or an editorial in Foreign Policy, there are terms, concepts, ideas, historical facts, etc. that I encounter about which I don’t know enough. I am curious about other coverage of these. In this case, an initial desire for news in general gives rise to specific questions. Conversely, sometimes I get stuck at specific questions, so I do a more general exploration, looking for news that might shed light on one of the many specific questions I have unresolved.

An astute reader might point out that if the main value of “new” information is to provide stimulus for curiosity, the “new” feature isn’t crucial at all. Reading the New York Times of ten years ago would probably stimulate my curiosity just as much as reading that of today. In some ways, it may do so even more. I agree with this, but I also suspect that we humans have a somewhat irrational bias towards favoring the “new” information. (Personally, I enjoy reading archives and past news coverage, as well as books written in a different era, though I don’t do much of this. Reading their predictions and comparing them with what actually happened is both amusing and sobering).

The Internet has changed the dynamics of how we work by meeting both these kinds of needs, and allowing us to jump from one to the other. On the one hand, newspapers and periodicals seem to be in a suicidal race to put up more and more of their content on the Internet, creating a very very huge stream of new information. On the other hand, basic reference material on the Internet is also growing rapidly. The most high-profile example of this is Wikipedia, the free online encyclopedia, that I’ve talked about in the past. But there are other, more low-profile examples, too.

For instance, the online bookseller Amazon is just one among many freely available online book catalogues. Many other retailers as well as libraries now maintain extensive online catalogues. Sites such as How stuff works and WikiHow have a lot of basic useful to-do information. More importantly, many organizations, including corporations and non-profits, are putting their data and advice online in a fairly well-organized form. Consider, for instance, societies devoted to medical conditions in the United States, such as Alzheimer’s Association. Or consider the many governmental websites (such as in the United States, but now also increasingly in less developed countries such as India) containing forms and help information that one would earlier need to go to a government office to obtain. Or consider the large amount of mapping information available that allows people to plan trips and find locations.

Linking the two forms of exploration better

Note: I use the term “newspaper” here as a generic term that may refer to a newspaper, periodical, or an online-only news/news commentary service.

Newspapers fulfill our need for “new” information, for information about things around us that we didn’t know we were looking for. For many of us, most of the time, this itch for new information is enough; we like to read and then forget the information and not think more about it. But it often happens that a particular news story makes one curious about further information on the topic. For instance, reading a story about recent progress in reducing infant mortality in an African country led me to wonder: what has been the general progress on infant mortality over the past century? Which countries have led and which countries have been behind?

To meet this kind of need, the newspaper needs to do something more than put an online version of a print article — it needs to link to pages that give an overall, better flavor of the subject, which may include past newspaper coverage on the topic, links to (in this case) sites on health/medicine, sites on secular demographic trends, etc. For instance, the New York Times has a topic page on infant mortality, from which I could see all the past coverage the New York Times had on the subject, as well as a multimedia chart comparing infant mortality across different countries and over time. Ideally, there would be a lot more information on the topic, neatly organized in the topic page, but this is a start. Unfortunately, many of the articles that the New York Times lists on this page (such as this one) lack backward links to infant mortality. (Incidentally, according to this blog post, the New York Times plans to release, in usable form, the linked data cloud that it uses for topic tags.)

Unfortunately, building this kind of topic page structure is a daunting challenge for smaller news and news commentary organizations than the New York Times. Many, such as Newsweek, have settled on a somewhat intermediate solution — they have topic pages, but the explanation part at the top is simply pulled from Wikipedia (with attribution) and all pages referencing the article are listed in a sortable table. (See, for instance, the Newsweek topic page on health care costs). Other organizations, such as the The Huffington Post, have tried a different, more sophisticated approach. See, for instance, the Huffington Post pages on John McCain and on wikipedia. However, even the Huffington Post does not advertise this extensive use of tags as a way of searching. Many smaller news organizations simply rely on search rather than any (externally visible) tagging system, probably because their scale and scope makes investment in tagging systems not worthwhile.

One of the solutions for the problem of lack of scale (i.e., it is hard for small news organizations to create separate articles or a separate taxonomy) may be for multiple news organizations to pool together their taxonomies, and even to share topic pages and results. The Washington Post/Newsweek company, to take one example, has such publications as the Washington Post, Newsweek, Slate, Foreign Policy, and The Root. These already link to each other, but by having pooled-together topic pages, they can reap more benefits. Small news organizations that do not want to invest in taxonomies can use public taxonomies, or those contributed by larger news organizations (for instance, that of the NYT).

Topic pages versus search

Some people have argued that powerful search features have largely made things such as tagging, categorization, and other forms of arrangement largely redundant. There is some truth to this argument. To quite an extent, the utility of categorization was to speed up extremely slow and cumbersome search processes, and the existence of quick computer-enabled search reduces that need.

But a close look would reveal that tagging today is a lot more powerful and useful than it could have been in a pre-digital era. Now, because of the lack of constraints of physical placement, the same topic can receive numerous tags. These tags allow a much larger number of cross-connections to be discovered. So, while the need for unsophisticated users to use tags/categories may have considerably reduced, the scope for their use by even marginally interested and sophisticated people has expanded considerably.

But the more important point is that a central hub, such as a topic page, is more than a mere tag. It is an intelligent starting point because of the way it collates together diverse kinds of information related to a topic. For instance, consider the New York Times topic page on Iran. This is more than just a list of pages referencing Iran, and its value lies in the way it gives a clear idea of the way the newspaper covers Iran.

Genuine advantage

News organizations in the United States are currently (As of 2009) experiencing severe financial problems, and have often been criticized of not innovating and adapting to the Internet culture. But even a little sampling of newspaper websites such as the New York Times, Time, Newsweek, the Wall Street Journal, and the Chronicle of Higher Education shows that news organization websites have been very innovative in adapting to the Internet to improve the quality of presentation of news even as they have to make painful cuts to their news staff.

In the past, newspapers were a one-stop shop for all kinds of information such as weather information (now, weather services across the world have websites and can send alerts to mobile phones), stock price information (again, stock exchange movements are now publicly available on their own and many other websites), and real estate and matrimonial information (that got them a lot of their ad revenues). In practically all these respects, there will probably never be a single one-stop shop.

There are two things news organizations should continue to be good at, and should lead in almost by definition: gathering news on-the-ground (thus meeting people’s instant-news-need) and providing a framework linking current news to past coverage (thus providing a vast reference corpus for answering specific questions people have, particularly those raised by current events). It is in the second respect that the shift to online news really helps improve news quality. But for this, there needs to be an effective mechanism to link the dots. I assume that part of a journalist’s training involves linking the dots and placing the news being gathered with their knowledge of history and past events. By making these links more public and sharing them with readers, news organizations can significantly improve the news value of their content.

As more and more amateur news gatherers come on to the scene and break a lot of news, even the ability of news organizations to manage to always be the first with new news may be in question (though newspapers are currently far ahead). Their large corpus of well-organized internal news archives can, however, give these organizations one advantage that will be harder to eliminate: the ability to put new stories in a broader perspective and grow topic pages and central hubs to provide readers with a quick overview of practically any topic.


Concerns have been raised by many that the New York Times does not link to source content even when such content exists on the web, and that a large fraction of its links are internal links to Times Topic pages. Some suspect that it is a strategy to maximize PageRank juice. This is definitely a valid concern, though I think that of late, the Times has started to link more to outside sources as well, and it definitely does freely link for blog entries, and at least it links to websites when talking about content specific to those websites. In any case, whether the Times is suppressing outward links at a cost to readers is largely tangential to the question of the utility of Times Topic pages. Here is a book chapter about linking practices.

July 7, 2009

Million words on the first-year: Mocumentary of first-year life at Chicago

Filed under: Chicago,Places and events — vipulnaik @ 10:10 pm

It is said that a picture is worth a thousand words. A six-minute video is probably a thousand pictures.

There is a tradition in the mathematics department of the University of Chicago whereby, at the end of each academic year, the second-year class organizes an evening of skits called “Beer Skits” (“beer” because the skits are accompanied by huge servings of beer). As part of the tradition, we decided this year to create a mocumentary (mock documentary) of life in the first year at Chicago. I had the original concept and fleshed out the main scenes, but a lot of the editing work before, during, and after the shooting was done by many of my batchmates, who all added their own insights and removed some of my original ones that would have been too abstruse.

Our six-minute video is a little funny, but it is also quite realistic, as those who have gone through a similar experience can attest. Okay, agreed, some of the scenes towards the end stretch the boundaries of realism, but almost everything we have is a slight adaptation of something that actually happened during our first year.

Below is an embedded Youtube video (you can make it full-screen for better viewing).

Download a high-resolution 3mbps DVD quality version in Windows Media Video format (104 MB).

Extensible automorphisms problem solved long ago

Filed under: My research — vipulnaik @ 10:00 pm

Some friends as well as other regular readers of this blog may remember that I mentioned the extensible automorphisms problem: if an automorphism of a group can be extended to any group containing it, must that automorphism be inner?

I had managed to prove this for finite groups, and had drafted a short paper with that and some other related results. My advisor George Glauberman then shared the results of this paper with some other mathematicians, and one of these people, Avinoam Mann, recalled that these problems had been the subject of papers in 1987 and 1990. You can see the extensible implies inner page on the groupprops wiki for more details.

Since I came up with the question myself, and also came up with the final answer myself (of course, with help and guidance from many others), I actually feel vindicated rather than distressed that this question was considered and solved long ago — it shows that the questions I ask have been asked by other people before, and been considered worth publishing.

In mathematics, one may be tempted to feel that it is getting harder and harder to get original results because more and more of the simple stuff has been taken. At the same time, as we acquire more and more knowledge, new researchers can stand on the shoulders of their predecessors and therefore ask and answer deeper questions. I feel that the benefits of being able to look farther and deeper outweigh the costs of having all the more basic results “already taken”.

I still do have a few additional results that I believe to be original, such as this one and its corollary, this one. I also have some partial results on the NPC conjecture, that I believe are new, such as this and others discussed on the NPC conjecture page. I hope to put these together with other results that I am yet to discover and hobble together something publishable in the not-too-distant future.

