The operational procedures of numerous authorities and companies are subject to strict regulations. In practice, it is often difficult to establish whether the rules in place are actually being complied with. We are developing algorithms that can automatically monitor regulatory compliance and handle large volumes of data.We are developing and implementing efficient, parallelised algorithms capable of monitoring whether data in a dynamic environment comply with given rules. The input for these algorithms is a real-time data feed as well as the rules to be monitored. The rules are formulated in an input language that allows users to express temporal and data dependencies between different events in the data feed in a simple and intuitive manner. If any of the rules are violated, a compact output of the data that caused the violation will be produced

Lay summary

Compliance is a crucial task, and companies have entire departments whose task is to oversee it. Given voluminous log files and other inputs, these departments must quickly and reliably monitor whether the procedures are being followed in compliance with the (possibly complex) rules in force. A typical rule for a bank might be: No customer may withdraw more than 5000 Swiss francs per week. Being able to identify individual violations is of considerable value.

We are developing algorithms that continually monitor incoming data for rule violations. The more complex the rule, the greater the challenge of checking it efficiently against enormous volumes of data. The expressiveness of the input language influences the possible complexity of rules and therefore the efficiency of the monitoring algorithm. Our goal is to find efficient monitoring algorithms for highly expressive and hence practically useful input languages.

The problem of automated rule monitoring can be approached from two directions. On the one hand, theoretical research is improving the expressiveness of the input language for rules and developing algorithms for these languages. However, the scalability of these algorithms to accommodate huge data volumes is frequently neglected. On the other hand, practical research is being conducted on implementing scalable algorithms for parallelised execution in computer clusters. In this context, work on designing input languages often does not receive the attention it merits. Our project combines the two approaches to the benefit of both.