Markov chains: PageRank and many others
What do Google’s PageRank, procedural music composition, text generation, speech and image recognition, among many, many others have in common?
Yes, you guessed it (it was easy as the title says it ;) : Markov chains are a basic element in their respective algorithms. (Surely, there are other approaches for some of these problems, but Markov chains are widely used).
Markov chains
So, what are Markov chains? They are a process, a sequence of states, where at a given time the next state is chosen probabilistically based only in the current state. That is, they don’t have any memory longer than in which state they are at the moment.
Let’s see a simple example of a Markov chain using a weather prediction system. To simplify it, let us assume that a day has only 2 possible weather states, sunny and rainy, and they last all day. We know that if today is sunny, tomorrow will be sunny with probability 0.9 and rainy with probability 0.1. If today is rainy, tomorrow can be sunny or rainy, both with probability 0.5.
That can be represented with a directed graph with weights on the arcs:

Graph modeling the weather system
Any Markovian system can be also represented by a transition matrix. In this matrix, every entry can be read as the probability of going from the state represented by the row to the state represented by the column. The matrix related to our system is:

Weather system associated matrix
This matrix has some nice properties. For example:
- P to the power of n represents the probabilities of going from a given state to another in n steps.
- If you multiply a vector representing the probabilities of being at a given state in a given moment by the matrix P, you get the probabilities of being at every step one step later. So, if we know that before we start we are on a sunny day, the first day will be sunny with probability 0.9 and rainy with probability 0.1, and if we multiply that by the matrix again, we get that the second day will be sunny with probability 0.86 and rainy with probability 0.14
- If the system has the following two properties
- You can go from every state to every other state.
- The number of steps needed to go from any state to the same state does not have to be necessarily multiple of a number different from 1.
then, as you advance steps the resulting vector converges towards what is known as steady-state or stationary distribution, which is independent from the start input. That is, there exists a number n of steps after which the probability of being in each step does not depend on the start distribution, and does not vary when advancing more steps. In our example, the probabilities of being in a sunny day in the long term are 0.833, and the probabilities of being in a rainy day are 0.167.
If you want to see more in-depth the mathematic operations to find those values, you can find them here.
Pagerank
So, how does this apply to PageRank algorithm? As you would probably know, PageRank is the algorithm used by Google Search to try to measure the importance of a webpage. As a curiosity, as the algorithm was developed by Larry Page while doing a Ph.D in Stanford University, the process was patented by the university and Google had to pay them an amount of shares sold later for $336 million in order to use the algorithm.
What PageRank (really, really simplified) does is to deal with the web as a directed graph, where each page represents a node and each link represents an edge. From a given page, all its links have the same weight. The solution to the page importance problem is the steady-state of the system. In it, probabilities of being at a given page in the long term are a function of how many pages link there and the probability of being at those pages (their importance). That is, a link to another page is like a vote, and the value of a vote is at the same time the sum of the values of the votes the linking page has.
Other applications
Although PageRank is currently a really important algorithm, it would be unfair to say that all what Markov chains can offer is that. Some other applications are:
- Text generators: Some text is feed as input, and a graph reflecting the partial order between words is built. Then, a random walk is done while generating a new text. A variation is not to consider the state as the current word only but the last n words. Texts generated this way do not usually make sense, but you have to read a few words to realize that. Another approach is not to use words as states but characters, which produces invented words most of the time. Some examples of Markov text generators are:
- An online generator
- Chat bots. Kooky is chat bot learning from many IRC channels and generating replies based on what it have received. It produces great responses, worth a read! Mark V Shaney is another example of Markov based IRC bot.
- Spam generation: Some spammers use Markov text generators to generate unique emails trying to not be caught by spam filters.
- Spam filtering: Yes, Markov chains are on both sides of the spam battle. It is not only used by the spammers but also by the filters [PDF].
- Ray-tracing: Markov chains are used in Metropolis Light Transport algorithm.
Of course, this list is not exhaustive. If you are interested in knowing more applications, Wikipedia holds a larger list.

