Projects in SenPAI

RoMa: Robustness in Machine Learning

RoMa aims to provide mechanisms to improve the security of ML in the application projects. Roma will interact with other technology projects regarding attack and solution models. The objective of RoMa is to increase the robustness of neural networks and other ML algorithms against attacks altering input data during testing phase either to evade correct classification or to enforce a wanted classification.


SecLLM: Security in Large Language Models

This project aims to analyze the security threats in large language models (LLMs) and propose defense mechanisms against them. Currently, the possible vulnerabilities that these models may face remain unknown. Furthermore, even for the identified vulnerabilities, the most effective defense strategies are yet to be determined. We aim to conduct a taxonomy analysis of the current attacks and investigate new potential attacks. In particular, we focus on prompt injection, backdoors, and privacy leaks. For example, it has been shown that it is possible to hijack the behavior of an LLM by introducing hidden prompts through cross-site scripting and, with this, conduct phishing attacks. Due to the exponential adoption of LLMs in commercial applications, these applications could be vulnerable to the security threats of LLMs. Thus, providing security guarantees to ensure trust and safety in these models is of utmost importance.


SePIA: SEcurity and Privacy In Automated OSINT

SePIA is an application project addressing various challenges in Automated Open Source Intelligence. Objectives are the encapsulation of the OSINT process in a secure environment following privacy by design, and the application of advanced crawling and information gathering concepts for automation of searching available data sources including utilizing ML for improving the state of the art in crawling. Further, SePIA deals with improving data cleansing by adding a feedback loop to crawling and analysis modules and improving the analysis methods for automated intelligence results based on ML.


XReLeaS: Explainable Reinforcement Learning for Secure Intelligent Systems

This project addresses the important aspects of transparency as well as explainable results and nets in ML. The aim is to build a software toolbox for explainable ML, also increasing other security aspects of the algorithms. A robotic environment is used as an example.


Protecting Privacy and Sensitive Information in Texts

The goal of this project is to explore Natural Language Processing methods that can dynamically identify and obfuscate sensitive information in texts, with a focus on implicit attributes, for example, their ethnic background, income range, or personality traits. These methods will help to preserve the privacy of all individuals - both authors as well as other persons mentioned in the text. Further, we go beyond specific text sources, like social media, and aim to develop robust and highly adaptable methods that can generalize across domains and registers. Our research program encompasses three areas. First, we will extend the theoretical framework of differential privacy to our implicit text obfuscation scenario. The set of research questions includes fundamental privacy questions related to textual datasets.
Second, we will identify to which extent unsupervised pre-training achieves domain-agnostic privatization. Third, the large gap between formal guarantees and meaningful privacy-preservation capabilities is due to a mismatch between the theoretical bounds and existing evaluation techniques based on attacking the systems.


Expired Projects

Adversarial Attacks on NLP systems

Duration 01.01.2020 - 31.12.2023

The project focused on a second challenge in ML/KI security in which AI systems are utilized as attackers. The focus here (NLP) was on textual data. The project dealt with hate speech and disinformation, which are relevant scenarios in OSINT applications. Results can also be used in the SePIA project, as OSINT is often based on text data. Scientific findings from the project are:

Robustness and debiasing

  • New method called Confidence Regularization, which mitigates known and unknown biases
  • Novel debiasing framework that can handle multiple biases at once

Model efficiency

  • New adapter architecture called Adaptable Adapters that allows an efficient fine-tuning of language models
  • New transformer architecture using Rational Activation Functions

Data collection

  • Analyzing sources of bias and making data collection more inclusive by exploring the potential of citizen scientists

Evaluation of generative models (e.g., LLMs)

  • Identifying pitfalls and issues when using existing inference heuristics for evaluation
  • Developed a novel synthetic data generation framework: FALSESUM