Security and transparency for AI-based solutions

While AI and here most often machine learning (ML) becomes more common as a tool for various security applications where data must be analysed, clustered or attributed, the security of the applied algorithms is often limited. Various research results in the past years show shortcomings of trained neural nets like robustness against targeted attacks. The risk of privacy loss is also discussed in the context of machine learning – potential leakage from trained neural networks as well as de-anonymization in complex data sets with the help of machine learning.

This leads to a lack of trust and acceptance by the public: ML is perceived as a risk and a threat, a mechanism to screen large parts of everyday life beyond control. Under these circumstances, using the potential of ML for security solutions and other sensitive applications becomes challenging.

In IT security the usage of ML is already established in multiple domains today. SPAM detection is a well-known example where support vector machines try to distinguish wanted from unwanted emails. Author attribution combines natural language forensics and machine learning. Deep learning helps identifying illicit images and improved malware detection as well as network intrusion detection.


Machine learning as a target of attacks and a tool for attackers

Today ML algorithms and trained nets become the target of attacks. Various approaches try to mislead or influence ML-based decisions, requiring countermeasures of IT security protecting the core assets of ML.

ML also becomes a tool of attackers. IT security needs to be prepared for attacks able to adapt more quickly to complex security measures; just like intrusion detection systems aim to identify complex attacks with the help of ML.

Adversarial machine learning will become more common in IT security. Whenever a security challenge can be described on the one hand with a relatively simple concept and on the other hand can be addressed by machine learning, the other side, be it defender or attacker, will use adversarial machine learning to efficiently identify weaknesses in the strategy of the other partner and deploy specialized attacks or defenses against it.


Improving the security of technology and applications based on machine learning

SenPAI addresses the subject of security and ML from two perspectives: The first perspective is improving the security of ML algorithms and systems based on ML. This does not include standard system security, which is a generic requirement for all IT systems. The focus is on security challenges that are especially and in some cases exclusively relevant to ML. The term "security" has to be seen in a broad sense here, as issues like privacy leakage or transparency of decisions shall also be addressed.

The second perspective is application-centric. As the National Center of applied Cybersecurity Research ATHENE has a focus on applied security solutions, SenPAI aims to develope and evaluate new security applications based on ML. These applications can and shall also utilize the security mechanisms developed in the technology-centric research projects and give a feedback on their usability and performance.

The application-centric projects may raise new challenges for the technology-centric projects. These projects will also potentially be Big Data projects due to their handling of complex data and their aim to derive efficiently information relevant for security matters from it.

The technology-centric projects will focus on research publications and PhD theses. Tools will be developed and implemented as well. The application-centric projects will focus on building demonstrators and discussing them with the public, governmental organizations and the industry.


Research goals of SenPAI

In recent years, AI has experienced rapid growth in performance and fields of application. However, there are still many challenges that have not been addressed by exhaustive solutions. Within SenPAI, the following aspects of AI and security are potential research goals:

Goal 1

Transparency and traceability of results

The interpretation of the results of ML will be an even more important challenge in the future than it is today. As long as the results of ML cannot be interpreted comprehensibly, a decision based on these results is problematic in many cases.

 

Goal 2

Robustness

In ML, robustness refers to the dependence of algorithms on input values. It describes, for example, how a procedure behaves with unexpected disturbances or noise at the input values. If ML is to be used in autonomous systems in particular, high robustness is important in order to counteract unforeseen behavior of the AI. Robustness also means problems with the training data. Here bias and overfitting can occur which result in trained networks, which do not provide satisfactory results in practice.

Goal 3

Availability of training data

It is in the nature of ML that training data is necessary for its use. Many ML methods also require training data that is annotated. However, the availability of such data is limited, which has numerous consequences. When it comes to personal data, the need for data is matched by the need for data protection. Without reliable methods of anonymizing this data, its use as training material is only possible to a limited extent.

Goal 4

Guarantee of data protection

Privacy and Big Data are often considered incompatible. However, a robust analysis of the actual effects of different mechanisms for the technical implementation of data protection on the results of the analyses is lacking. On the other hand, Big Data and ML pose a challenge to the effectiveness of data protection, as they enable links between data that can lead to the removal of supposed anonymity. Furthermore, there is the risk that trained networks of the ML can reconstruct the training data and thus enable references to persons.

Goal 5

Interdisciplinary: Guidelines for the use of AI

The use of AI will require rules on their behavior in order to prevent prejudice or discrimination by them. An interdisciplinary discussion of the nature and context of this regulated behavior is necessary to develop guidelines that can be interpreted by a machine and can be reviewed in case of doubt. In view of the rapid development of AI and the increase in its use cases, an interdisciplinary discussion must address these fundamental questions as quickly as possible. Otherwise, technical regulations based on engineering concepts but not considering ethical or legal aspects will be implemented. This will significantly reduce public acceptance and thus hinder or slow down the use of the benefits of AI.