1st Example

Imagine we have three web pages \(A\), \(B\) and \(C\) such there is only one link from page \(A\) pointing to page \(B\) and another link from page \(B\) pointing to page \(C\). Using the PageRank formula with \(p=0,85\), we find

\(\begin{array}{ccl} PR\left(A\right) & = & 0,15\\ PR\left(B\right) & = & 0,15+0,85\times PR\left(A\right)=0,15+0,85\times0,15=0,2775\\ PR\left(C\right) & = & 0,15+0,85\times PR\left(B\right)=0,15+0,85\times0,2775=0,385875 \end{array}\)

We were able to easily find the exact PageRank values for each web page.

Note that \(PR\left(A\right)<PR\left(B\right)<PR\left(C\right)\).

2nd Example

we have now three pages \(A\), \(B\) and \(C\) such that one link pointing from \(A\) to \(B\), one link pointing from \(B\) to \(C\) and one link pointing from \(C\) to \(A\). Using the PageRank formula with \(p=0,85\), we find

\[PR\left(A\right)=0,15+0,85\times PR\left(C\right)\] \[PR\left(B\right)=0,15+0,85\times PR\left(A\right)\] \[PR\left(C\right)=0,15+0,85\times PR\left(B\right)\]

Now we can not immediately find the exact values for the PageRank for each page. We need to solve the simultaneous equations with respect to \(PR\left(A\right), PR\left(B\right)\) and \(PR\left(C\right)\). It is easily verified that the solution is \[PR\left(A\right)=PR\left(B\right)=PR\left(C\right)=1.\]

As expected, all the three pages have equal PageRank value. You may try other examples using this app, where you may play with at most \(6\) pages (\(A\), \(B\), \(C\), \(D\), \(E\) and \(F\)), and define how they are linked to each other.

In general , to compute the PageRank value for a set of \(n\) web pages linked to each other, we will need to simultaneously solve \(n\) equations (the equations for the PageRank for each page) with \(n\) unknowns (the PageRank values for reach page). When \(n\) is small, this is not a problem. However, the Google database includes several millions of web pages, and this raises difficulties on finding the exact solution for the simultaneous equations. In such case, the approach consists on defining an iterative procedure to find approximate values for the PageRank, the approximation becoming progressively better each time we perform a new iteration.

Let us look at an example considering the three web pages \(A\), \(B\) and \(C\), and links as described in the 2nd example, taking as initial approximations the value \(1\) for the three PageRank values. Using the PageRank formula for each page, we find

\[PR\left(A\right)=0,15+0,85\times PR\left(C\right)=0,15+0,85\times1=1\] \[PR\left(B\right)=0,15+0,85\times PR\left(A\right)=0,15+0,85\times1=1\] \[PR\left(C\right)=0,15+0,85\times PR\left(B\right)=0,15+0,85\times1=1.\]

That is, we found the same PageRank values, all remaining equal to \(1\). This happened because we chose the initial values equal to the exact PageRank value for each page that we had been able to compute previously.

What happens if our initial guesses for the PageRank values were different from the exact ones? To have an idea of what will happen, choose our initial guesses all equal to \(0,5\). Using the PageRank formula, we find

\[PR\left(A\right)=0,15+0,85\times PR\left(C\right)=0,15+0,85\times0,5=0,575\] \[PR\left(B\right)=0,15+0,85\times PR\left(A\right)=0,15+0,85\times0,5=0,575\] \[PR\left(C\right)=0,15+0,85\times PR\left(B\right)=0,15+0,85\times0,5=0,575\].

Using the PageRank formula again with the values obtain above, we now find

\[PR\left(A\right)=0,15+0,85\times PR\left(C\right)=0,15+0,85\times0,575=0,63875\] \[PR\left(B\right)=0,15+0,85\times PR\left(A\right)=0,15+0,85\times0,575=0,63875\] \[PR\left(C\right)=0,15+0,85\times PR\left(B\right)=0,15+0,85\times0,575=0,63875.\]

Now we may repeat this procedure as long we like. After \(10\) repetitions, or iterations, of this procedure, we find

\[PR\left(A\right)=PR\left(B\right)=PR\left(C\right)=0,90156...,\]

while, after \(67\) iterations, we find the approximations

\[PR\left(A\right)=PR\left(B\right)=PR\left(C\right)=0,99999....\]

We see that the computed approximations for the PageRank values are getting closer to the exact ones. This will happen regardless of our initial guess for the PageRank for each web page (for this example with three web pages, if we start with an initial guess equal to \(1000\) for each page, after performing \(100\) iterations the approximation obtained is \(1,00044\) for each page). Moreover, this behaviour for the approximations constructed will happen for any other set of web pages and links among them. It would be nice to have an explanation (and a proof) for this observation.


Remark: The iterative procedure described may be interpreted as an application of an iterative algorithm to solve simultaneous equations, known as the Jacobi method. There is an alternative iterative method, known as the Gauss-Seidel method, differing from the described one by immediately using the newly calculated approximations in the subsequent calculations. To illustrate how this Gauss-Seidel method works, look again at the second usage of the iterative procedure described above. After choosing the initial guess equal to \(0,5\) for the PageRank for each page, we would find for the first iteration

\(\begin{array}{ccl} PR\left(A\right) & = & 0,15+0,85\times PR\left(C\right)=0,15+0,85\times0,5=0,575\\ PR\left(B\right) & = & 0,15+0,85\times PR\left(A\right)=0,15+0,85\times0,575=0,63875\\ & & (\mbox{using }PR\left(A\right)=0,575\mbox{ instead of }PR\left(A\right)=0,5)\\ PR\left(C\right) & = & 0,15+0,85\times PR\left(B\right)=0,15+0,85\times0,63875=0,69293.\\ & & (\mbox{using } PR\left(B\right)=0,63875 \mbox{ instead of }PR\left(B\right)=0,5) \end{array}\)

In general, the Gauss-Seidel method produces approximations that get closer to the exact values faster than the approximations constructed by the Jacobi method.