Beschreibung
The organization of a growing amount of data as well as the identification of relevant information therein is a current research challenge with increasing importance. This thesis contributes to a (semi-)automatic support for this task by personalized hierarchical structuring. The goal is the development of data mining methods that structure a collection automatically into a hierarchy. Since this can be done in several ways, the goal is furthermore to regard specific user preferences to discover the most suitable structure. These structuring preferences are, however, only given implicitly through previously organized personal data of the user. Two fundamentally different types of approaches are studied in detail. First, structuring preferences are given through a fixed, known target hierarchy. This enables the use of hierarchical classification methods. The proposed approaches use a framework that allows the combination of any classifier with information about (hierarchical) class relations for making the final classification. Both methods target at the maximization of user perceived benefit, which is measured by user centric performance evaluation. Several comparative experiments were carried out, which verified that especially in more challenging scenarios with high practical relevance, classification performance can be increased significantly above the baseline. Second, structuring preferences are simplified from the first, very restrictive case. Instead of requiring a completely described target hierarchy, preferences can be given merely for parts of the data, e.g., by describing a part of the target hierarchy. These preferences are integrated into hierarchical clustering through a novel set of constraints, the must-link-before constraints. Several methods were developed to integrate these constraints either directly, through metric learning, or both. The quality of the individual methods was measured through comparison with benchmark data based on a novel hierarchy comparison measure. In a comprehensive evaluation, the advantages and disadvantages of the individual methods could be revealed. It was verified that the proposed structuring preferences indeed improve clustering quality.