How to compute

How to compute PageRank?

Imagine we have a set of \(n\) web pages with links between them and no links to other web pages. For each the web page \(P_{j}\), we represent by \(PR\left(P_{j}\right)\) its PageRank value and by \(C\left(P_{j}\right)\) the number of links starting in \(P_{j}\).

The original PageRank formula, developed by the Google founders themselves (Larry Page e Sergey Brin), is: \[PR\left(P_{i}\right)=\left(1-p\right)+p\left(\frac{PR\left(P_{j1}\right)}{C\left(P_{j1}\right)}+\frac{PR\left(P_{j2}\right)}{C\left(P_{j2}\right)}+...+\frac{PR\left(P_{jk}\right)}{C\left(P_{jk}\right)}\right)\] where

  • \(p\) is a parameter chosen between \(0\) and \(1\) (usually \(p=0,85\))
  • \(PR\left(P_{i}\right)\) represents the PageRank value of the web page \(P_{i}\)
  • \(j_{1},j_{2},...,j_{k}\) are the indices of the other pages that include a link ending on the web page \(P_{i}\)
  • \(PR\left(P_{j1}\right),PR\left(P_{j2}\right),...,PR\left(P_{jk}\right)\) represent the PageRank values of those web pages
  • \(C\left(P_{j1}\right),C\left(P_{j2}\right),...,C\left(P_{jk}\right)\) represent the number of links in those web pages

That is, to the defined minimum PageRank value of a web page, given by \(1-p\), we add terms depending on how the web page \(P_{i}\) relates to the other web pages in the Internet. Thus, the PageRank value of a web page is uniformly transmitted to all the links pointing to the page, with a weight described by the parameter \(p\).

A link pointing fom the web page \(A\) towards the web page \(B\) may be interpreted as a "vote" of the page \(A\) supporting the page \(B\).

It follows from this interpretation that, for a web page to have a high PageRank value it is not enough that there are many other pages with links pointing to it. This should be complemented by high PageRank values of these web pages, which should share their PageRank values with a small number of other web pages (as the PageRank value transmitted by a web page is shared by all the links towards the page, this means that being the end point of a web page with say \(5\) links is easily more advantageous than being the end point of a web page with a a higher PageRank value but including a large say \(100\) links starting from it).

According to this formula, the PageRank value of a web page depends on the PageRank values of all pages that include link pointing towards the given page. Hence, how can we actually compute this value since at the beginning we do not know the PageRank value of any web page?

Remark: In the original defining PageRank, \[PR\left(P_{i}\right)=\left(1-p\right)+p\left(\frac{PR\left(P_{j1}\right)}{C\left(P_{j1}\right)}+\frac{PR\left(P_{j2}\right)}{C\left(P_{j2}\right)}+...+\frac{PR\left(P_{jk}\right)}{C\left(P_{jk}\right)}\right)\] we may divide both sides by \(n\), to obtain \[\frac{PR\left(P_{i}\right)}{n}=\left(\frac{1-p}{n}\right)+p\left(\frac{PR\left(P_{j1}\right)}{nC\left(P_{j1}\right)}+\frac{PR\left(P_{j2}\right)}{nC\left(P_{j2}\right)}+...+\frac{PR\left(P_{jk}\right)}{nC\left(P_{jk}\right)}\right)\] or, \[PR^{*}\left(P_{i}\right)=\left(\frac{1-p}{n}\right)+p\left(\frac{PR^{*}\left(P_{j1}\right)}{C\left(P_{j1}\right)}+\frac{PR^{*}\left(P_{j2}\right)}{C\left(P_{j2}\right)}+...+\frac{PR^{*}\left(P_{jk}\right)}{C\left(P_{jk}\right)}\right)\] where, for each web page \(P_{j}\), \[PR^{*}\left(P_{j}\right)=\frac{PR\left(P_{j}\right)}{n}.\]

It is sometimes convenient to take \(PR^{*}\) as the PageRank value instead of \(PR\). Indeed, as for each web page \(P_{j}\) we have that \(0<PR\left(P_{j}\right)\leq n\), it follows that \(0<PR^{*}\left(P_{j}\right)\leq1\). Hence, with this alternative definition for the PageRank, its value will always be between \(0\) and \(1\), regardless of the number of pages considered.