From a privacy point-of-view, whether a data sample (image, text, audio etc) has been used in a model’s training or not (supervised learning), is a sensitive information in many cases. Infering/extracting this specific information, is what we are referring to as Membership Inference.
Membership inference can be used both to support or to undermine privacy.
On one hand, it can violate privacy by revealing private information. For example, infering membership of a patient’s medical data on the training set of a disease-prediction model, may disclose the patient’s medical condition.
On the other hand, people can utilise membership inference to check if their private information was used for training without their permission. Thus, improving privacy by enabling audits and ensuring accountability in data usage.
Membership Inference Attack is one of the most popular privacy attacks on AI, that tries to determine whether an input sample was part of the model’s training dataset or not.
The authors have tried to consider the worst case scenario from an attacker’s perspective to prove the legitimacy of the attack.
Assumptions
A data sample of choice
(Image, text, audio etc)
Black-box access to a target model
(Only the model’s prediction/inference is accessible. Not the internal parameters of the model)
Objective
Attacking black-box models is more complex than attacking white-box models whose structure and parameters are known to the adversary. The authors evaluated their techniques against the popular MLaaS (Machine Learning as a Service) platforms which provide black box access to machine learning services via API calls.
The key observation that guided the authors was that machine learning models often behave differently on the data that they were trained on versus the data that they “see” for the first time.
So, they tried to use machine learning to detect this difference in behavior of the target model. Thus, converting the Membership Inference Problem into a Binary Classification Problem by using Adversarial Machine Learning.
Goal
Construct an Attack Model that can recognize differences in the target model’s predictions on the inputs that it trained on versus the inputs that it did not train on.
Challenge
The adversary has no information about the internal parameters of the target model and only limited query access (black-box access) to it through the public API.
Observation (Experimental)
Similar models trained on relatively similar data records using the same service behave in a similar way.
So, the authors introduced a novel Shadow Training technique that enables training the attack model on proxy targets for which the training dataset is known.
Create multiple Shadow Models to imitate the behavior of the Target Model
Generate training data for the Shadow Models
An adversary can generate synthetic training data using the Target Model (Model-based Synthesis) or statistics about the underlying population (Statistic-based Synthesis). Or he/she might have access to a potentially noisy version of the Target Model’s training dataset (Real-world Data).
In this work, the authors have evaluated on the worst case scenario where the shadow training dataset is disjoint from the target model’s training dataset. But in real world, they will have some overlap between them as the adversary will have some idea about the population of the data. Thus, may result in a better attack performance.
Train (Supervised) the Shadow Models using the generated data
Must be similar to the target model’s training. It’s feasible when the target model’s training algorithm and model structure is known. Or in case of MLaaS (Mahine Learning as a Service), the attacker can use exactly the same service (e.g., Google Prediction API) to train the shadow model as was used to train the target model.
Train (Supervised) the Attack Model on the labeled inputs and outputs of the shadow models
To recognize differences in shadow models’ behavior when these models operate on inputs from their own training datasets versus inputs they did not encounter during training.
This binary classification task can be performed using any state-of-the-art machine learning framework or service.
CIFAR-10, CIFAR-100, Purchase Dataset (based on Kaggle’s acquire valued shoppers), Location Dataset (from the publicly available set of mobile users’ location “check-ins”), Texas Hospital Stays Dataset, MNIST, UCI Adult (Census Income)
Google Prediction API, Amazon ML, Locally Implemented Neural Network
The authors have discussed about the following mitigation strategies:
Restrict the prediction vector to top k classes
Still useful for the users, but less exposing to the attackers.
Round the classification probabilities in the prediction vector
Increase entropy of the prediction vector
Modify (or add) the softmax layer and increase its normalizing temperature For a very large temperature, the output becomes almost uniform and independent of its input, thus leaking no information. So, there will be a tradeoff between utility and information leakage.
Use regularization to overcome overfitting
From the evaluation stage, the authors have found that overfitting is an important factor in information leakage of machine learning models, although it’s not the only one.
In the realm of Machine Learning, overfitting in itself is a canonical problem that limits the predictive power of the model.
So, if we can mitigate overfitting we are not only inreasing the accuracy of the model, but also tackling the information leakage problem. Thus, instead of considering tradeoffs between privacy and utility, we can now improve both.
Differential Privacy
By definition, differential privacy based training is immune to membership inference attacks. But that may significantly reduce the model’s accuracy.
This work proposed a novel technique called Shadow Training to attack machine learning models to infer memberships.
Through the extensive experiments and evaluations, it proivided us with a good understanding on how machine learning models leak information about their training datasets and how different factors contribute to that.
Moreover, the authors suggested some mitigation strategies, have implemented and evaluated those strategies. Although they found the validity of the mitigation strategies, it was not enough to fully prevent membership inference attacks.
Which in turn, solidified the privacy implications that they addressed on their work.