A Short History of Content Discovery on the Web

My wife’s blog is indexed by Google. I am not sure how that happened.

She didn’t ask for it to be indexed. So how did the blog get into Google’s index? OK, so maybe Google followed a link to my wife’s blog from a web-page already in its index.  But that means that the web-page owner would have had to discover the blog somehow (to add a hyperlink to his page). So how exactly would a web-page owner discover the blog? (answer at the end of the post)

Nobody thinks too much about this, but how do users (and search engines) discover content on the web?

Remember that “discovering” content from results of a search engine query doesn’t count as content discovery – that’s actually content rediscovery. Before a user can discover new content through search engine results, the search engine itself has to discover said content. Surprisingly, even the search engine’s discovery of new content by following links is not content discovery. Before a search engine can discover new content by following a link, some web-page owner had to discover the new content and needed to add a hyper-link. I’m more interested in that fundamental process by which newly published content is discovered by other users.

I count three fundamental methods by which users (and search engines) discover content:

  1. Portals: Users go to a well-known website (like New York Times) and find new content. Search engines also may have a list of well-known websites to index by default.
  2. Social Network: 10 years ago, this meant email. A content publisher might send an email to all his friends. If the content was good enough, it may get forwarded, and linked to by a wider set of people. Today, it encompasses Facebook and Twitter.
  3. Interest Groups: I use this term to mean groups of people interested in the same thing getting together to exchange information. 10 years ago, this meant distribution lists and usenet groups. Today, it means Yelp, Twitter, etc.

Note also that I do not consider advertising to be a significant mechanism for content discovery. The primary purpose of advertising, in my view, is for discovery of products and services available in the real world; it is rarely used to advertise information available online.

It is interesting to list out how each of the three mechanisms of content discovery has evolved to deal with the higher rate of content generation on the web. Notice how Twitter has innovated on two of the content discovery mechanisms (while Facebook has innovated on only one). This is why I believe that it is hard to beat Twitter as a content discovery network.

Mechanism Evolution
Portals 10 Years Ago:
Portals content used to be editorially managed. A small group of (paid) writers would submit articles for review to the editors (typically salaried employees), who would then approve publishing it up to the portal.

Today:
Using a small group of paid employees to exert editorial control does not scale. The most popular portals (like Yelp) rely on crowd-sourced content. A community of users not only generates content, but also reviews and validates content. Even New York Times crowd-sources some content by allowing users to submit comments to articles.

Social Network 10 Years Ago:
Email was pretty much the only alternative. If you were a content publisher, you’d send an email out to your friends, and if they found it interesting, they would forward it further. Many successful portals (Angie’s List, for example) actually started out that way.

Today:
Facebook (and to a lesser extent, Twitter) has made it easy to inform your social network about interesting content. Both Facebook and Twitter also make it easier to re-publish/retweet (equivalent of email forwards) interesting content from someone in your social network.

Interest Groups 10 Years Ago:
Usenet groups were the primary mode of communication among users with similar interests. This was cumbersome in that you actually had to get approval to create a usenet group, etc. A secondary approach would be to create a distribution list.

Today:
The web has multiple ways of organizing interest groups. Do you want to participate in the interest group for a local business? Head over to Yelp (all reviews for a local business constitute an interest group). Do you want to participate in the interest group for a product? Head over to the reviews for that product on Amazon. Interest groups have gotten much more fine-grained and ubiquitous.

More than anyone else, this is one space that Twitter has revolutionized (Facebook, not so much). Twitter has made it easy to create an interest group for content – in two ways:

  1. First, each person (and his/her followers) form a virtual interest group. I have found myriad interesting users through retweets, and I have ended up following their content (in essence, subscribing to a distribution list of their content).
  2. Second, Twitter’s search mechanism (hashtags) allow content streams to be instantly organized into interest groups. For instance, when I follow the hashtag #obama on Twitter, I now have the means to track content related to President Obama. Where users were forced to post to multiple usenet groups 10 years back, a single tweet with multiple hashtags suffices on Twitter today.

Oh, and about how my wife’s blog got into Google’s index. It could be one of two ways:

  1. I have often referenced my wife’s blog post on my company’s internal parenting alias – it is possible somebody found the content interesting and linked to it (this would be the “Interest Group” method outlined above).
  2. I had left a comment on a New York Times article (ahem) about my wife’s blog. Google could have gotten it from there. That would illustrate the “Portal” method of discovery outlined above.

One response to “A Short History of Content Discovery on the Web

  1. The answer is you. You let the bird out. 🙂

Leave a comment