Entropy

As describe earlier, the entropy is computed using the formula \[E=-\sum_{i,j=1}^{n}x_{j}p_{ij}log_{2}\left(p_{ij}\right),\] or, equivalently, by \[E=-\sum_{j=1}^{n}x_{j}\left(\sum_{i=1}^{n}p_{ij}log_{2}\left(p_{ij}\right)\right)=\sum_{j=1}^{n}q_{j}x_{j},\] where \[q_{j}=-\sum_{i=1}^{n}p_{ij}log_{2}\left(p_{ij}\right).\]

For each \(j\epsilon\left\{ 1,\ldots,n\right\} \), the value o \(q_{j}\) depends on the number of links included in the page \(j\). If this web page \(j\) includes \(s_{j}\) links, where \(s_{j}\neq0\), then there will be \(s_{j}\) different possible values for the probabilities \(p_{ij}\), described by \(\frac{1-p}{n}+\frac{p}{s_{j}}\), and more \(\left(n-s_{j}\right)\) possible values given by \(\frac{1-p}{n}\). Hence \[\begin{array}{ccl} q_{j} & = & -s_{j}\left(\frac{1-p}{n}+\frac{p}{s_{j}}\right)log_{2}\left(\frac{1-p}{n}+\frac{p}{s_{j}}\right)-\left(n-s_{j}\right)\frac{1-p}{n}log_{2}\frac{1-p}{n}=\\ & = & -p\left(\frac{s_{j}\left(1-p\right)}{np}+1\right)log_{2}\left(1+\frac{np}{s_{j}\left(1-p\right)}\right)-log_{2}\frac{1-p}{n}=\\ & = & pf\left(\frac{1-p}{np}s_{j}\right)-log_{2}\frac{1-p}{n}, \end{array}\] where \(f(x)=-\left(x+1\right)log_{2}\left(1+\frac{1}{x}\right)\), for \(x>0\). This function \(f\) is strictly increasing, so the larger the number of links, \(s_{j}\), the larger the value of \(q_{j}\). Hence, the largest possible value for \(q_{j}\) is \[Q=-n.\frac{1}{n}log_{2}\frac{1}{n}=log_{2}n,\] reached only when the web page includes links pointing towards each one of the existing pages \(\left(s_{j}=n\right)\), or when it includes no links at all \(\left(s_{j}=0\right)\). For both cases, we easily compute that \(p_{ij}=\frac{1}{n}\) for every \(i\epsilon\left\{ 1,\ldots,n\right\} \).

As what concerns the least possible value for \(q_{j}\), the argument above, implies that it is reached when each page includes exactly one link \(\left(s_{j}=1\right)\). It follows that the \(p_{ij}\) values are all equal to \(\frac{1-p}{n}\), except one, given by \(\frac{1-p}{n}+p\). Hence the lowest value for \(q_{j}\) is \[\begin{array}{ccl} q_{j} & = & -\left(\frac{1-p}{n}+p\right)log_{2}\left(\frac{1-p}{n}+p\right)-\left(n-1\right).\frac{1-p}{n}log_{2}\frac{1-p}{n}=\\ & = & -p\left(\frac{1-p}{np}+1\right)log_{2}\left(1+\frac{np}{1-p}\right)-log_{2}\frac{1-p}{n}. \end{array}\]

Remark that the highest and least values for the coefficients \(q_{j}\), \(Q\) and \(q\) do not depend on the particular web page, that is, for every \(j\epsilon\left\{ 1,\ldots,n\right\}\), we have that \(q\leq q_{j}\leq Q\). It, thus follows that \[\sum_{j=1}^{n}qx_{j}\leq\sum_{j=1}^{n}q_{j}x_{j}\leq\sum_{j=1}^{n}Qx_{j}.\]

Moreover, we have that \[\sum_{j=1}^{n}qx_{j}=q\sum_{j=1}^{n}x_{j}=q.1=q\] and \[\sum_{j=1}^{n}Qx_{j}=Q\sum_{j=1}^{n}x_{j}=Q.1=Q.\]

Hence, we derive the conclusion: \[q\leq E\leq Q\]

We have proved that \(q\) is the lowest possible value for system's entropy, reached when each page includes exactly one link, while \(Q\) is the largest possible value, attained when every page includes either every link or no link at all. The first situation, each page including exactly one link, corresponds to last uncertainty possible for the surfing path the user will follow. Indeed, as at each step there is exactly one choice with a higher probability of being chosen, the sequence of visited pages is the most predicable possible. The later case corresponds to highest uncertainty possible for the description of the sequence of visited pages, as in this case the following page to be visited is completely random.

Remarks:

- It is not always true that th entropy increases with the number of links included in a web page, as this mar decrease the PageRank values of pages that include a larger number of links.

- For a given fixed value for the parameter \(p\), both the highest value \(Q\) and the lowest value \(q\) for the entropy increase with the number of existing pages \(n\).

- For a given fixed value for the number of existing web pages \(n\), the largest the parameter \(p\), the smallest will \(q\) be and vice-versa. If we allow \(p\) to approach \(0\), the value of \(q\) will approach to the highest possible value of the entropy \(Q\). When \(p\) approaches \(1\), the value of \(q\) approaches \(0\).

- The highest possible value for the entropy \(Q\) does not depend on the value of the parameter \(p\).

- Regardless of the configuration, when the parameter \(p\) approaches \(0\), the values for all the \(q_{j}\) approach \(Q\), that is, the system's entropy approaches its highest possible value. On the other hand, when the parameter \(p\) approaches \(1\), the values for\(q_{j}\) approach \(log_{2}\left(s_{j}\right)\). Hence, if every page include exactly one link, the system's entropy will approach \(0\).

- Even if some web pages include more than one link, if the corresponding PageRank value approaches \(0\) when the parameter \(p\) approaches \(1\), then, also, the system's entropy will approach \(0\).