Category Archives: Technology

The Closed Future of Open Source

The one thing about Open Source that always seems unsustainable to me is that the contributors don’t get any significant share of the value they create.

For instance, Linux and MySQL are open-source projects that bring in millions of dollars in profits for multiple companies (that actually sell/consult for these products). And yet, little money trickles down to the code contributors – certainly not a significant percentage of total profits made by their ecosystems. The Open Source argument is that the appropriate incentive for code contributors is that they get to improve the software for their own use and the value created for other users is merely a “side effect”. But when that “side effect” runs into millions of dollars, we must ask – can we compensate contributors more fairly?

And contributors are even more important to big open source projects today. Gone are the days when a part-time developer could add a feature he needed by mucking around with code. As businesses have come to rely on open source, the expectations of the products are much higher and the feature asks are more complex. Such products now need a consortium of contributors who have spent considerable time building expertise in the product, and are available to address the complex requirements (and timely patches, and other deliverables). In other words, big open source projects now need full-time developers. And full-time developers need to be paid.

In short, contributors need to be paid both because the project needs their energies full-time (i.e., the project is their day job) and because they should receive a fair share of the value they create. How might we formalize a set of rules under which the next generation of open source projects might accrue value to contributors more fairly? I believe the answer consists of three aspects – allowing open source software to be sold for a price, having rules to fairly distribute the proceeds among full-time contributors, and allowing contributors to protect their contributions by discouraging forking of code.

The simplest solution to compensate contributors is to sell the open source software for a price, and have the contributors share in the proceeds. One way to do this is to have all contributors form a partnership or a consortium to own the project, and then sell the software for a fee to whoever needs it (either consulting companies or directly to customers).  The membership of the consortium could change over time (as old contributors retire and new contributors join); the project would continue since the consortium would “own” the project. In effect, RedHat does exactly that, but I’m proposing that this type of partnership be formalized (along with rules deciding how profits get distributed).

One unfortunate side-effect of allowing open source software to be sold for a price is that that forking of code needs to be discouraged. You can’t have anybody picking up a code-base built over the years, and offer it for free (when the contributors want to charge for it). But on the other hand, open source has been successful precisely because of unbridled sharing and collaboration. I am unclear as to how to solve this problem. A few open source projects “solve” this problem by having some piece of the product as closed source (RedHat, Android). I think there needs to be a definition of “fair code reuse” similar to “fair use” mechanisms in copyright law, and it needs to be formalized as a license.

In other words, what we need is a legal entity that represents the open source project and its contributors, and mechanisms to compensate contributors and protect their hard work. I am convinced that an appropriate legal framework would spur even more innovation in software development.

But would we end up with something that looked indistinguishable from closed source projects? Not really. One thing I have always admired about Open Source projects is that it fosters this dynamic community of contributors. Because the code is openly viewable, it enables a “long tail” of part-time contributors to participate, and enables a new generation of leaders to emerge (as some of these part-time contributors transition to being paid full-time contributors). This is something worth keeping; I think this is something that will live on – even closed-source software projects will end up being more “open” in that they will be viewable to a large set of people, many of whom will be able to provide feedback and make minor changes and emerge into the next set of leaders. There really is no better way to hire future team members for any software project.

So hopefully, we’ll end up somewhere in the middle. But in this new reality, open source won’t be free. And it won’t be that open anymore.

Thanks to Eric Fleischman, Jeenandra Kumar and Dr. Gabriel Ferrer for feedback on initial drafts.

The Internet Is Not Optimized For Paid Content

Imagine if the internet functioned like an app store.

There would be “free” webpages/websites (like today) and “paid” webpages/websites. The free web pages would be accessible without any cost, just like they are, today (like a free app in a smartphone app store). However, paid web content could be paid for with no more than two clicks – the first click to agree to pay for the content, and the second click to confirm the action. Any paid webpage/website could be paid for  with no more than two clicks.

Such a mechanism would make paid content much more viable.

The conventional wisdom argues that content publishers are going out of business today because content is abundant, and it’s hard to charge for it. I am not convinced that is the whole truth. The other side of the story is that the internet was not designed for paid content. It is hard to get people to pay for content on the internet (each site must ask for payment information, etc.). An unwritten assumption always has been that information will be available for free, available to all. That is one of the reasons internet advertising is so prevalent – the internet is not optimized for paid content.

Today, most paid content sites need to ask users to enter their credit-card from scratch and create new sign in credentials. It’s so hard to get users to pay for content, no wonder such sites ask users for long-term subscriptions. A solution that allows two-click payment for web content would enable “a la carte” content consumption – a user would only pay for the webpages they actually read, and this in turn would make it more likely that users would pay for content.

How do you make two-click content payment pervasive? I believe that the internet needs a standards-based layer (on top of current internet mechanisms) that enables easy payment.

Some may argue that all the pieces already exist. There are payment intermediaries like PayPal, Google Checkout, Amazon, etc. Then why isn’t the paid content model more prevalent? The short answer is that fragmentation is a vicious cycle (see table below for the more complete answer). Fragmentation without an underlying standard means that content providers must implement support for multiple payment intermediaries. This reduces adoption of the paid model, which in turn reduces the market for paid content, which in turn causes intermediaries to charge heftier commissions (if they want to turn a profit). And of course, heftier commissions reduces adoption even more. Moreover, users don’t have accounts with these intermediaries, and do not want to manage another account online.

An internet standard for easy payments would fix all these issues by eliminating fragmentation. Content providers can accept payment from any and all intermediaries simply by implementing the standard. This would reduce the barrier to supporting paid content model, expand the market for paid content, and bring down commissions. But more importantly, it would enable common identity providers (like Google, Yahoo and Facebook) to act as payment intermediaries, allowing users to “login to the internet” once for all the functions they want to perform. This would reduce barriers to payment on the internet. Finally, a standard ensures that there will be many more competing intermediaries, and the competition will keep intermediary costs down.

Factor How an internet standard helps
Cost: Content providers don’t want to fork out the 2-4% commission that online payment intermediaries charge, in addition to credit card commissions. They’d rather have users directly enter credit card information on their site. An internet standard will promote competition among payment intermediaries, bringing down commissions.
Complexity: Most users don’t have accounts with payment intermediaries (and have no incentive to create one) An internet standard enables regular identity providers (which users use everyday – like GMail, Yahoo, Facebook) to become payment intermediaries, eliminating the need for users to sign up with another entity.
Fragmentation: There are too many payment intermediaries (PayPal, Google Checkout, etc.), requiring content providers to support each one to ensure a seamless experience for users. It is desirable to have multiple intermediaries competing for users. An internet standard ensures that content providers can easily support  all these intermediaries (as long as they implement the standard).

Why isn’t paid content more prevalent on the internet?

The New York Times is a widely read newspaper, and yet loses at least 50 million dollars a year (in spite of all the advertising revenue it generates from its website). It has about 150 million readers visiting the site annually – if each user spent just 33 cents annually, The New York Times would be breaking even.

With a standardized payment solution for the internet, it would..

The Audio Medium: A Third World Revolution Waiting to Happen

A significant percentage of the world’s population cannot read.

In India, home to a seventh of the world’s population, functional literacy rate stands at an abysmal 20%. This, in spite of heavy investments in improving literacy. The story probably repeats itself in other parts of the developing world. Indeed, in the last 50 years, the revolutionary information media – like TV and radio – have been revolutionary precisely because they did not involve the written word.

And yet, the written word remains even more important than before. TV and radio cannot deliver the long tail of content that is available through books. The small-town plumber or the rice farmer cannot rely on TV or radio to deliver all the specialized content they need. Worse, the advent of the internet has significantly increased the volume of information stored in text form.  And yet, this content remains inaccessible to the large numbers of citizens of the world who cannot read.

Does there exist a better medium than text which might make this long tail of content more accessible to under-privileged of the world?

Ideally, a replacement solution for books must satisfy three criteria.

Space-decoupled Time-decoupled Cheap to consume Easy sharing
TV with DVR
video on
audio on
? ?

A comparison of solutions based on various media
The more the checkmarks in a row, the better the medium is.
A “?” means that the solution does not satisfy that property today,
but it is technically feasible to support that property.

First, it must decouple content publishing from content consumption, in both time and space. Before books became popular, knowledge transfer happened through the spoken word. However, the spoken word required the listener (content consumer) to be in the same place as the speaker (content publisher), at the same time. Books removed these restrictions – that is why they were so revolutionary. It is interesting to note that both TV and radio decouple publishing and consumption in the space dimension, but not in the time dimension. In other words, a consumer can consume TV content anywhere in the country, but must tune in at the precise time the content is broadcast. Aside from books, I can think of only three solutions that decouple publishing from consumption in both space and time: TV set with DVR, Video over the internet and Audio over the internet.

Second, the medium has to be cheap to consume. Only then would it be accessible to the developing world. Ideally, the consumption cost is so cheap, that each person can consume content independently. TV with DVR fails this requirement – even without DVR, a TV set is expensive enough that villages in India can afford only one for the whole village. Video over internet/wireless also fails this requirement, since it drains too much power to render video, even on a cellphone. This is no small matter in a country where villages don’t have electricity, and people pay mobile charging stations just to get their cellphones charged. On the other hand, audio (i.e., podcasts and audiobooks) sounds promising. Even a cheap feature-phone can be made to play audio content. And cellphones have high penetration in the developing world, which ensures that the medium is very accessible.

Third, the medium has to enable easy sharing of content. A farmer in rural China is not going to download audio content off the internet. He will most likely acquire audio content from an acquaintance. In fact, it is desirable that he acquire the content from an acquaintance in a way that does not involve the mobile provider’s network (for example, bluetooth). This would save mobile bandwidth, reduce the cost of consuming the content and make the solution easier to scale (also see footnote). This is technically feasible, but not implemented today. Cellphones will need to support easy phone-to-phone transfer of audio content (with appropriate safeguards for paid content) using bluetooth or other such technology.

But even if cellphones support easy sharing of content, will there be enough free content to make this feature useful? I believe there will be. Podcasts may be a niche medium in the U.S., but there will be enough demand for audio content in the developing world that it will be as ubiquitous as blogs are in the western world.

Finally, the audio medium has two other advantages that no other medium has.

  • All the existing content in text form can automatically be converted into audio form. This is huge, because it makes all existing text content accessible to the developing world.
  • And most importantly, it is a medium that doesn’t require your full attention – you can listen to audio content while performing other tasks.

In summary, I believe that cellphones with  (a) the ability to listen to podcasts/audiobooks and (b) easily share audio content with other people, could usher in a revolution in the developing world.

5 centuries ago, the written word replaced the spoken word as the dominant means of information transfer. I am rooting for the spoken word to stage a comeback.

Note: This is also the reason why personalized feeds of audio content will never work for a country like India. Each feed would need to be delivered by a central system over the mobile provider’s network, and would take too much bandwidth to ever be viable. Go back to reading the post.

A Short History of Content Discovery on the Web

My wife’s blog is indexed by Google. I am not sure how that happened.

She didn’t ask for it to be indexed. So how did the blog get into Google’s index? OK, so maybe Google followed a link to my wife’s blog from a web-page already in its index.  But that means that the web-page owner would have had to discover the blog somehow (to add a hyperlink to his page). So how exactly would a web-page owner discover the blog? (answer at the end of the post)

Nobody thinks too much about this, but how do users (and search engines) discover content on the web?

Remember that “discovering” content from results of a search engine query doesn’t count as content discovery – that’s actually content rediscovery. Before a user can discover new content through search engine results, the search engine itself has to discover said content. Surprisingly, even the search engine’s discovery of new content by following links is not content discovery. Before a search engine can discover new content by following a link, some web-page owner had to discover the new content and needed to add a hyper-link. I’m more interested in that fundamental process by which newly published content is discovered by other users.

I count three fundamental methods by which users (and search engines) discover content:

  1. Portals: Users go to a well-known website (like New York Times) and find new content. Search engines also may have a list of well-known websites to index by default.
  2. Social Network: 10 years ago, this meant email. A content publisher might send an email to all his friends. If the content was good enough, it may get forwarded, and linked to by a wider set of people. Today, it encompasses Facebook and Twitter.
  3. Interest Groups: I use this term to mean groups of people interested in the same thing getting together to exchange information. 10 years ago, this meant distribution lists and usenet groups. Today, it means Yelp, Twitter, etc.

Note also that I do not consider advertising to be a significant mechanism for content discovery. The primary purpose of advertising, in my view, is for discovery of products and services available in the real world; it is rarely used to advertise information available online.

It is interesting to list out how each of the three mechanisms of content discovery has evolved to deal with the higher rate of content generation on the web. Notice how Twitter has innovated on two of the content discovery mechanisms (while Facebook has innovated on only one). This is why I believe that it is hard to beat Twitter as a content discovery network.

Mechanism Evolution
Portals 10 Years Ago:
Portals content used to be editorially managed. A small group of (paid) writers would submit articles for review to the editors (typically salaried employees), who would then approve publishing it up to the portal.

Using a small group of paid employees to exert editorial control does not scale. The most popular portals (like Yelp) rely on crowd-sourced content. A community of users not only generates content, but also reviews and validates content. Even New York Times crowd-sources some content by allowing users to submit comments to articles.

Social Network 10 Years Ago:
Email was pretty much the only alternative. If you were a content publisher, you’d send an email out to your friends, and if they found it interesting, they would forward it further. Many successful portals (Angie’s List, for example) actually started out that way.

Facebook (and to a lesser extent, Twitter) has made it easy to inform your social network about interesting content. Both Facebook and Twitter also make it easier to re-publish/retweet (equivalent of email forwards) interesting content from someone in your social network.

Interest Groups 10 Years Ago:
Usenet groups were the primary mode of communication among users with similar interests. This was cumbersome in that you actually had to get approval to create a usenet group, etc. A secondary approach would be to create a distribution list.

The web has multiple ways of organizing interest groups. Do you want to participate in the interest group for a local business? Head over to Yelp (all reviews for a local business constitute an interest group). Do you want to participate in the interest group for a product? Head over to the reviews for that product on Amazon. Interest groups have gotten much more fine-grained and ubiquitous.

More than anyone else, this is one space that Twitter has revolutionized (Facebook, not so much). Twitter has made it easy to create an interest group for content – in two ways:

  1. First, each person (and his/her followers) form a virtual interest group. I have found myriad interesting users through retweets, and I have ended up following their content (in essence, subscribing to a distribution list of their content).
  2. Second, Twitter’s search mechanism (hashtags) allow content streams to be instantly organized into interest groups. For instance, when I follow the hashtag #obama on Twitter, I now have the means to track content related to President Obama. Where users were forced to post to multiple usenet groups 10 years back, a single tweet with multiple hashtags suffices on Twitter today.

Oh, and about how my wife’s blog got into Google’s index. It could be one of two ways:

  1. I have often referenced my wife’s blog post on my company’s internal parenting alias – it is possible somebody found the content interesting and linked to it (this would be the “Interest Group” method outlined above).
  2. I had left a comment on a New York Times article (ahem) about my wife’s blog. Google could have gotten it from there. That would illustrate the “Portal” method of discovery outlined above.

The most important aspects of your product (are the ones costliest to change)

I was recently in a discussion about a version 1 product, where the product had a great user interface, but the software team had pretty much made a mess of everything else (the underlying implementation didn’t scale, it wasn’t cost-effective, etc.). Other participants in the discussion were critical of the product, but I felt that the exceptional user interface made it a worthy product.

No version 1 product is perfect. With version 1 products, teams have much more work to do than available resources, and have to pick what to focus on. The right aspect to focus on is usually the interface to the user, even at the expense of other aspects of the system. For consumer products, the user interface is what the consumer cares most about. It is the factor most responsible for a product’s image. Changing it later on is costly, because it causes cognitive disruption to users. It is necessary to get it right the first time. There’s a reason the CEO of Apple is deeply involved in user interface design – it’s because that’s the most important part of the product.

For a programming language, the most important thing is the interface that the developer sees – i.e., the language design. Language design, if not carefully done, can be impossible to rectify later on (because developers will have written code against the language which will also need to change). Indeed, it’s better to focus fully on language design in version 1, even if other aspects have to be compromised. Java did not succeed because it had a superior compiler, or a superior development environment – it succeeded because it was a well-designed language. All the rest of the plumbing can always be improved later on; but it’s important to get the language right because it is the costliest to change later on.

Yet other times, it’s not the interface at all. For a database system, the database schema is the most important decision. Once data has been stored in a particular schema, it’s virtually impossible to migrate it into a new schema. So it’s important to get that right.

But whatever that aspect is – software teams should pay careful attention to the aspect of their product that is costliest to change later on, and then focus their energies to getting that right – especially in a v1 product, even if it means getting some other things wrong.

Activity Streams are not the next Hyperlinks

A few months back, I saw the following tweet in my feed:

Activity streams are the new SERPs. EdgeRank replaces PageRank as the algo to crack. Read comments in

While I agree that the web will eventually move beyond PageRank and Hyperlinks (for the purposes of relevance), I don’t see how Activity Streams is a viable successor. Hyperlinks have three problems that I detailed in a previous post:

  1. They require content to be modified (to link to other relevant content)
  2. They require content publishers to expend the effort (rather than having readers of content expend the effort)
  3. PageRank is computationally expensive

A better way to do relevance on the web is simple user voting. It works for Amazon, Yelp, Stack Overflow, Seeking Alpha and many other sites. Sure, you have to solve the problem of spammers skewing the voting (more on that at the end of the post), but it fixes all the three problems above. With user voting, content publishers don’t need to modify the content, the readers of the content (not the publishers) expend the effort to vote, and it’s computationally cheaper.

The problem with Activity Streams (and the Facebook Like Platform) is that they are not lightweight enough. Activity Streams requires users to establish long-term relationships with the publisher, and users don’t always want to do that. The fact that I liked one of Tony Delgrosso’s tweets on Twitter does not mean I want to follow him (and commit to receiving future content from that publisher).

An additional problem with the Facebook “Like” platform is that like Hyperlinks, it requires content publishers to make changes to their content (to incorporate the “like” button). This is a significant problem to its adoption. A much better model is Twitter (along with a “like/digg” feature) where users exchange links to content and can like/digg them, thereby voting on the relevance of the underlying content.

Post Script: Earlier in the post, I alluded to the problem of spam users in any user-voting based system. Most user-voting systems have solved the spam problem fairly well (Yelp review filtering, Twitter does a good job with spammers, etc) – but I think there is a need for a system like Facebook Connect, which includes user-reputation as part of login identity. I do think the problem is solvable, though.

Facebook: Social Network King, Content Network Aspirant?

Some believe that Facebook’s recent privacy changes are an attempt to make more money by sharing this data with other businesses, or by presenting more targeted ads. But Facebook could have achieved the same by leaving privacy as it was, and just changing the Terms of Use to indicate that they could share the data with other businesses (for purposes of advertising).

I think Facebook’s motivation in moving to less private defaults is more ambitious. They don’t want to be just a keep-in-touch network, but also a content discovery network.

Content Discovery Networks Keep-in-touch Networks
Flickr, Twitter, Yelp Facebook, Email, IM
Optimized for propagation of content.
Goal is to allow best content
to surface to the top.
Optimized for non-public communication.
Users want their communication
to be widely read.
Users expect their communication
to only be readable by a trusted subset of users.
Allows formation of new connections with people you may not know in the real world,
allows one-way connections
Mirrors your offline relationships
Relationships, Content is public by default to make discovery easy Relationship and Content are not visible to everybody.

All social networks fall on one of the ends of a spectrum.

On one end are content discovery networks (Twitter, Flickr, Yelp). In these networks, engaged user communities share news, ideas and opinions. Content is public by default so it is easily discoverable. You can connect with people you don’t know in the real world. The focus is on enabling users to connect with content most relevant to them. Think of these as specialized search engines.

On the other side of the spectrum are (for lack of better word) keep-in-touch networks (Facebook, Email, IM). The purpose is communication between people as an end in itself. Connections mirror your real world relationships. There is an expectation that communication will not be public (only visible to people you trust). By this definition, Facebook, while being more public than Email/IM, is still a keep-in-touch network.

Keep-in-touch networks will always have more users than content discovery networks. There are far more people interested in non-public communication, than people interested in sharing  news/ideas/opinions. Even people who want to share ideas or opinions have a need for private communication. Which is why Twitter will never have as many users as Facebook. And yet, because content discovery networks play in the search space (they’re specialized search engines), I suspect content discovery networks are more profitable than keep-in-touch networks.

It’s hard to straddle both ends of the spectrum. Just ask Google, who tried adding a content discovery network (Buzz) to Gmail (which is a keep-in-touch network). Content discovery networks require more public defaults. Keep-in-touch networks require otherwise. It’s tricky to provide mechanisms for both types of communication.

And yet, Facebook is trying to do exactly that. They’ve always had some viral features typical of a content discovery network  (you can re-publish a status update from your friend as your own, you can become a “fan” of entities). But most users don’t use Facebook for content discovery – they use it to keep in touch with their social network.

Still, Facebook has kept at it. Facebook wants to be a content discovery network for web pages (like Twitter is). Make no mistake, the recent “Like” platform is an attempt to let users connect with entities that are relevant to them. Of course, to be successful, it needs to be viral, so one user can discover relevant entities (by looking at another person’s “likes”). This motivates the shift towards making a users “likes” public (which is also something Facebook did recently).

Will Facebook succeed? What do you think?

Update 11/1/2010: Dave McClure is saying something similar in his post titled: “How to Take down Facebook – Hint: It ain’t Twitter“.