Mission and Objectives
The Data Management Lab (DMLab) advances foundational and practical solutions for managing complex, large-scale data. We research, design and validate methods, architectures and tools that ensure data integrity, quality, efficient querying and integration across relational as well as non-relational sources.
Goals
- Prepare data for their subsequent use in model-centric learning.
- Develop algorithms and systems for ranking query results according to given preferences.
- Address non-functional aspects such as fairness and diversity in the management of data and queries.
- Enable reasoning capabilities even in the presence of complex data models (graphs, etc.).
- Extend classical relational concepts (triggers, ER schemata, …) to non-relational contexts.
- Promote reproducible research, toolkits, and the education of students in rigorous data management engineering.
Research Areas
Ranking and Preferences
Algorithms and query paradigms for retrieving the most relevant results according to user preferences — be it qualitative or quantitative.
Fairness
Measurement and analysis of fairness in data-intensive applications (ML, recommendation, ranking) to avoid discriminatory outcomes and grant ethical decisions, while minimizing adverse effects on performance.
Data Preparation
Application of preprocessing techniques for data transformation, reduction, cleaning, integration, and semantic-driven reconciliation for the subsequent use of data in machine learning tasks.
LLM-Enhanced Data Management
Use of large language models to support data management tasks, including complex query formulation, execution, and explanation of results.
Graph Databases & Complex Data
Extension of complex data models with reactive behavior, integration of graph data with relational and semi-structured sources, and efficient graph query processing patterns. Notable applications in health data, financial systems, and more.
Context-aware Data Management
Adaptation of data management processes to contexts, i.e., any information that characterizes the situation of entities involved in the interaction between users and applications.