Classification of Pages

With the aim of having tools for building indices and for searching the pages in this site, a page classification scheme was developed that assigns to each page one or more term. A term, that is also called a category, a type or class, describes a subject matter addressed in the pages or a characteristic of the pages.

For instance, the category Geometry is used to classify pages on Geometry, and 3D Television those that have stereoscopic contents to be seen in 3D TVs.

Categories and sub-categories

As the number of categories increases and by refining the classification it is natural to expect that some categories will be particular cases of other ones.

For instance, the category Rectangle is a sub-category of Polygon, which in turn is a sub-category of Geometry. This means that any page classified by Rectangle must be taken as also being classified by Polygon and by Geometry.

In order to simplify indices and ease a search by categories, a structure is imposed on the set of categories so that a page classified by a category $$C$$ is implicitly classified by any category that is more general than $$C$$.

In the last example, any page classified by Rectangle is implicitly classified by Polygon and Geometry.

In that structure, usually known as the term hierarchy, each category $$C$$ has direct descendants that are its more general sub-categories, that is, those that are not particular cases of any other sub-category of $$C$$. It is also usual to say that $$C$$ is the father of its direct descendants, the children.

In the same example, Rectangle is not a direct descendant of Geometry as Polygon is a sub-category of Geometry and is more general than Rectangle.

The use of such a hierarchy of categories leads to the fact that in an index obtained from the classification a page describing a category $$C$$ will present

• links to the pages for its direct descendants: its more general sub-categories
• links to all pages that are explicitly classified by $$C$$, in principle those that are related to $$C$$ and are not exclusively related to one of its sub-categories.

To access the pages that are implicitly classified by $$C$$ in the index one has to go through the pages for all its sub-categories: all its descendants in the hierarchy, both direct and indirect ones.

Note that a classification-based search does not make this separation between explicitly and implictly classified pages.

Other properties of the term hierarchy

As it does not make sense to have a category that is a particular case of itself, the hierarchy cannot have cycles: a category cannot be a descendant of itself.

It is quite common to have a category that is a child of two or more categories.

An example is Mechanism that may be addressed from the standpoint of its geometry and of its history, and therefore is a sub-category of both Geometry and History of Mathematics.

A consequence of this is that there will be cases of categories and pages that are referred to in the index in several different contexts.

Another detail concerns the need for separating different relationships between categories and pages. The hierarchy developed for the classification of Atractor pages allows for treating differently the pages under a certain category that are unequivocally classified by it, and those that are weakly related to it.

More formally

In simple classification schemes the structure for the set of categories may be a (directed) tree: there is a single category more general than all the others and that will be taken as the tree root, and every other category has a single direct ancestor (or father). Therefore there is a single path from the root to each category. When there are categories with more than one direct ancestor, the structure comes out as a directed acyclic graph (or acyclich digraph).

In general, the set of categories is associated to a directed acyclic graph that is induced by the binary relation is a more general sub-category of: being $$\cal D$$ the set of sub-categories of $$C$$, $$S\in \cal D$$ is a more general sub-category of $$C$$ if and only if there is in $$\cal D$$ no other element $$S'$$ that is more general than $$S$$. In terms of the hierarchy, the vertex $$C$$ has, as its only direct descendants, the vertices of the more general sub-categorias of $$C$$.

The classification associates to each vertex $$C$$ the set $$\cal P$$ of pages explicitly classified by $$C$$. In the index the page for $$C$$ points to the pages for the direct descendants of $$C$$ and to the pages in $$\cal P$$. A search engine based on the classification will find for $$C$$ the union of $$\cal P$$ with all the similar sets on the sub-hiearchy rooted at $$C$$.

In order to having different relationships between $$C$$ and the pages explicitly classified by $$C$$, the set $$\cal P$$ is assumed to have subsets, each one associated to one kind of relationship. A further constraint can be that those subsets are necessarily disjoint, making up a partition of $$\cal P$$.