In the literature a huge amount of methods for clustering and classification have been proposed. However, the vast majority of these approaches rely on object representations given in terms of feature vectors. Such object representations have a number of useful properties.
For example, object similarity, or distance, can be easily computed by means of the Euclidean distance or similar measures, in the n-dimensional real space. Recently, however, a growing interest in graph-based object representation can be observed. As a matter of fact, graph based object representations have a number of advantages over feature vectors. For example, graphs are able to represent not only values of object properties, i.e. features, but can be used to explicitly model structural relations that may exist between different parts of an object.
Obviously, graphs have a higher representational power than feature vectors. On the other hand, a shortcoming of graph representations is a lack of suitable methods for clustering and classification. In fact, almost all methods for data clustering and classification need feature vectors as basic data structures. This is mainly due to the fact that some of the basic operations needed in clustering and classification, for example, computing the sum or weighted sum of two objects, is not available for graphs. There are only a few algorithms for graph clustering and classification available. These algorithms have been carefully 'handcrafted' and are computationally very expensive. Obviously, lacking are a systematic approach and a general methodology that would allow one to transfer classification and clustering algorithms from the vectorial in the graph domain in a principed way.
In the current project we aim at bridging the gap between the domain of feature-based and graph-based object representation in such way that we preserve the best of both approaches, i.e. the high representational power of graphs and the rich algorithmic repository of tools for feature vectors. It is planned to proceed along two lines of research. The first is based on the idea of embedding a population of graphs in the n-dimensional real space by means of graph prototype selection and edit distance computation, while the secondline of research will build upon kernel methods. Kernel methods have seen an enormous amount of attention in pattern recognition and machine learning recently. Kernel methods of the first generation were designed so as to map feature vectors from the n-dimensional real space to dot products in some higher dimensional feature space. Meanwhile, novel kernel methods for structured, non-vectorial data have become available. However, research on these more general kernel methods is still in early phase. Along the second line of research of the current project, we plan to study existing and develop new graph based kernel methods, and apply them to clustering and classification.